Rocky — the trust plane for your warehouse

The trust plane for your warehouse — a typed compiler with branches, replay, column-level lineage, contracts, and per-model cost. Storage and compute stay where they are.

Get Started in 60 Seconds Coming from dbt? Migrate in 5 steps View on GitHub

The disasters Rocky prevents

Rocky exists because the most expensive failures in modern data platforms aren’t slow queries — they’re trust failures. A column type changes upstream and a revenue dashboard quietly diverges for three days. A SELECT * pulls a new column nobody designed for. A Snowflake-only function lands in a Databricks-targeted project and only fails in prod. Warehouse spend doubles and nobody can attribute which model caused it. An auditor asks who changed fct_revenue.amount and when.

dbt is a templating engine; it can’t catch any of these at compile time. SQLMesh moved correctness to the planner. Rocky moved it to the compiler — so the disasters above become E013 diagnostics, blocked PRs, rocky lineage-diff comments, and content-addressed replay artifacts.

Typed compiler, not a templating engine

Column-level type inference across the full DAG. 35+ diagnostic codes with actionable suggestions. E013 blocks the PR before a row is written.

Column-level lineage at compile time

Every column traced through every transformation, before execution. rocky lineage-diff main lists per-column downstream blast radius for PR review.

Branches + deterministic replay

Named branches as isolated schemas. rocky replay <run_id> reconstructs a run bit-for-bit — inputs, code, and outputs as a content-addressed artifact.

Per-model cost attribution

Cost is a column on every run, not a dashboard. [budget] blocks fail the run on overspend. rocky preview cost projects spend at PR time.

AI gated through the compiler

Every AI suggestion type-checks before it lands. Generate → compile → auto-fix → ship — the closed-loop nobody else has.

Dialect-divergence lint

P001 catches Snowflake-only constructs in a Databricks project, and the reverse. Cross-warehouse teams stop discovering portability bugs in prod.

Get Started in 60 Seconds

curl -fsSL https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.sh | bash
rocky playground my-first-project
cd my-first-project
rocky compile            # type-check
rocky test               # run assertions locally
rocky plan               # build a plan, prints plan_id
rocky apply <plan-id>    # execute the plan

No credentials needed — the playground is DuckDB-backed.

One trace, one cost graph, one replay handle

rocky run writes a single RunRecord to the state store. Three read commands — rocky trace, rocky cost, rocky replay — each project that same record from a different angle: causality and concurrency, per-model warehouse spend, and the byte-for-byte reproducibility artefact. Three views of one tree, not three telemetry domains stitched together after the fact. Walk through it end-to-end in POC #17 (trace + cost + replay against the same run_id).

Who Rocky is for

Rocky is built first for data platform engineers running production-critical, multi-tenant pipelines on Databricks — the team where silent failures cost real money and dbt has hit a ceiling. The trust primitives are most battle-tested there.

The next ring out: Snowflake and BigQuery shops evaluating SQLMesh, who want correctness moved to the compiler (not the planner) and prefer SQL by default. Adapters are Beta today — see the introduction for the honest 2026 roadmap.

Coming from dbt? Run rocky import-dbt against your project, get a Rocky repo on disk, ship the gains incrementally — no rewrite.

Using Dagster? dagster-rocky wraps the CLI as a ConfigurableResource. Auto-discovery, asset checks, Pipes — one subprocess hop.