Typed compiler, not a templating engine
Column-level type inference across the full DAG. 35+ diagnostic codes with actionable suggestions. E013 blocks the PR before a row is written.
Rocky exists because the most expensive failures in modern data platforms aren’t slow queries — they’re trust failures. A column type changes upstream and a revenue dashboard quietly diverges for three days. A SELECT * pulls a new column nobody designed for. A Snowflake-only function lands in a Databricks-targeted project and only fails in prod. Warehouse spend doubles and nobody can attribute which model caused it. An auditor asks who changed fct_revenue.amount and when.
dbt is a templating engine; it can’t catch any of these at compile time. SQLMesh moved correctness to the planner. Rocky moved it to the compiler — so the disasters above become E013 diagnostics, blocked PRs, rocky lineage-diff comments, and content-addressed replay artifacts.
Typed compiler, not a templating engine
Column-level type inference across the full DAG. 35+ diagnostic codes with actionable suggestions. E013 blocks the PR before a row is written.
Column-level lineage at compile time
Every column traced through every transformation, before execution. rocky lineage-diff main lists per-column downstream blast radius for PR review.
Branches + deterministic replay
Named branches as isolated schemas. rocky replay <run_id> reconstructs a run bit-for-bit — inputs, code, and outputs as a content-addressed artifact.
Per-model cost attribution
Cost is a column on every run, not a dashboard. [budget] blocks fail the run on overspend. rocky preview cost projects spend at PR time.
AI gated through the compiler
Every AI suggestion type-checks before it lands. Generate → compile → auto-fix → ship — the closed-loop nobody else has.
Dialect-divergence lint
P001 catches Snowflake-only constructs in a Databricks project, and the reverse. Cross-warehouse teams stop discovering portability bugs in prod.
curl -fsSL https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.sh | bashrocky playground my-first-projectcd my-first-projectrocky compile # type-checkrocky test # run assertions locallyrocky plan # build a plan, prints plan_idrocky apply <plan-id> # execute the planNo credentials needed — the playground is DuckDB-backed.
rocky run writes a single RunRecord to the state store. Three read commands — rocky trace, rocky cost, rocky replay — each project that same record from a different angle: causality and concurrency, per-model warehouse spend, and the byte-for-byte reproducibility artefact. Three views of one tree, not three telemetry domains stitched together after the fact. Walk through it end-to-end in POC #17 (trace + cost + replay against the same run_id).
Rocky is built first for data platform engineers running production-critical, multi-tenant pipelines on Databricks — the team where silent failures cost real money and dbt has hit a ceiling. The trust primitives are most battle-tested there.
The next ring out: Snowflake and BigQuery shops evaluating SQLMesh, who want correctness moved to the compiler (not the planner) and prefer SQL by default. Adapters are Beta today — see the introduction for the honest 2026 roadmap.