GitHub - rocky-data/rocky: The typed graph between your code and whichever warehouse, table format, or query engine you've chosen — typed compiler, branches, replay, column-level lineage, compile-time contracts, per-model cost. Adapters: Databricks, Snowflake, BigQuery, DuckDB. Single static Rust binary. Apache 2.0.

Rocky

Rocky is the typed graph between your code and whichever warehouse, table format, or query engine you've chosen.

Not a warehouse. Not a table format. Not a query engine. The trust plane for your data — a typed compiler, named branches, deterministic replay, column-level lineage, compile-time contracts, and per-model cost — running over your existing Databricks / Snowflake / BigQuery / DuckDB. Apache 2.0.

Rocky exists because the disasters that cost data teams real money — silent schema drift wrecking a revenue dashboard, a column rename quietly poisoning 47 downstream models, an auditor asking who touched fct_revenue.amount and when, a cost spike that no one can attribute to a model — all share a common shape. They're problems the warehouse can't see and the templating engine on top of it was never asked to. Rocky owns the graph between your code and the warehouse so those problems become compile errors, blocked PRs, and verifiable artifacts instead of pages and post-mortems.

It's built for the team running production-critical multi-tenant pipelines on Databricks today, on Snowflake or BigQuery tomorrow, who can't tolerate another silent failure. Storage and compute stay where they are. Rocky works on the SQL you already have — the .rocky DSL is an acceleration when you want it, not a gate.

The disasters Rocky prevents

Disaster	What dbt does	What Rocky does
Upstream changes a column type	Silent — fails downstream, hours later	`E013` at compile, blocks the PR
Required column dropped from a contract	No contract concept	`E010` at compile, blocks the PR
Column rename with unknown blast radius	`dbt docs` post-hoc, table-level (dbt Cloud Enterprise has column lineage in their UI, also post-hoc, not PR-blocking)	`rocky lineage-diff` at PR time, column-level, downstream consumers listed, blocks the merge
`SELECT *` pulls a new column you didn't expect	Silent	`P002` warning, downstream consumers named
Snowflake-only function written for a Databricks project	Runs in dev, fails in prod	`P001` dialect-portability lint at compile
Run cost doubles, no one knows which model	Manual warehouse spelunking	`RunOutput.cost_summary` per model, every run
Auditor asks: who changed `fct_revenue.amount`, when, and why?	Git blame + screenshots	`rocky replay <run_id>` — content-addressed record of the exact code and the output it produced
Sev-2 at 3 AM, half the pipeline already ran	Re-run everything	`rocky run --resume-latest` — checkpoint, three-state circuit breaker, skip what succeeded

Each row is a real failure mode, with a Rocky command that turns it into a non-event. The same primitives — typed compiler, content-addressed state, column-level lineage, per-model cost — back every row.

Already on dbt? rocky import-dbt converts a vanilla dbt project to Rocky in one command. The "What dbt does" column above is dbt Core's behavior. dbt Fusion — dbt Labs' Rust rewrite of dbt Core (public beta) — catches some compile-time issues that dbt Core misses, but doesn't ship named branches, content-addressed deterministic replay, per-model cost attribution as a first-class column, dialect-portability lint across warehouses, or declarative governance + masking outside dbt platform's paid tiers. Those stay Rocky's surface, Apache 2.0.

Try it in 60 seconds

# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/rocky-data/rocky/main/engine/install.ps1 | iex

rocky playground my-first-project
cd my-first-project
rocky compile && rocky test && rocky run

No credentials needed — the playground runs end-to-end on local DuckDB.

Who Rocky is for

Rocky is built first for data platform engineers running production-critical, multi-tenant pipelines on Databricks — the team that's hit dbt's ceiling, where silent failures cost real money, and where Dagster is already the orchestrator. That's the launch wedge, and that's where Rocky is most battle-tested.

The next ring out: Snowflake and BigQuery shops currently evaluating SQLMesh, who want correctness moved to the compiler (not the planner) and prefer SQL by default over Python-first ergonomics. Adapters are Beta today; see Where Rocky is today below.

See it in action

Each demo below is a self-contained POC in examples/playground/pocs/ — cd in, run ./run.sh, reproduce locally.

Detects schema drift the moment it happens

A source column type changes upstream. On the next run, Rocky diffs source vs. target, drops the target, and recreates it. No silent data corruption, no dbt-style quiet divergence.

POC — 02-performance/06-schema-drift-recover

Enforces data contracts at compile time

Missing required columns, protected columns being removed, or unsafe type changes surface as diagnostic codes (E010, E013) before a single row is written.

POC — 01-quality/01-data-contracts-strict

Named branches for risk-free experiments

Create a branch, run against it in an isolated schema, inspect, then drop or promote. Column-level lineage shows the downstream blast radius before you ship.

POC — 00-foundations/06-branches-replay-lineage

Column-level lineage, not table-level

Trace a single column from a downstream fact back through its aggregations, all the way to the seed. Blast-radius analysis without reading every model.

POC — 06-developer-experience/01-lineage-column-level

AI model generation with a compile-validate loop

Describe what you want in plain English. Rocky generates a Rocky DSL model, compiles it, and retries on parse failure — the Attempts: 2 line shows the loop catching a first-pass error invisibly.

POC — 03-ai/01-model-generation

PR-time blast-radius with `rocky lineage-diff`

Compare two git refs and get a per-changed-column readout of downstream consumers — pre-rendered Markdown drops straight into a GitHub PR comment. CODEOWNERS-style review tooling can't reach this granularity without a compiled engine.

POC — 06-developer-experience/11-lineage-diff

Classify columns, mask by environment, gate CI

Tag PII columns in the model sidecar; bind tags to mask strategies in [mask] / [mask.<env>]. rocky compliance --env prod --fail-on exception exits 1 the moment a classified column has no resolved strategy — a one-line CI gate against accidentally-unmasked data.

POC — 04-governance/05-classification-masking-compliance

Incremental loads with persistent watermark state

strategy = "incremental" plus a timestamp_column is all it takes. Rocky writes the high-water mark to the embedded state store; subsequent runs only INSERT … WHERE timestamp > watermark. Append 25 rows after a 500-row load — run 2 still finishes in 0.2s.

POC — 02-performance/01-incremental-watermark

Where Rocky is today

The trust primitives — compiler, branches, replay, lineage, contracts, cost attribution — are production-grade on Databricks. We're explicit about the rest:

Databricks is the production target for 2026. Snowflake, BigQuery, and Trino adapters are Beta — connection, execution, and the core run loop work, but conformance coverage is still growing. If your enterprise warehouse is Snowflake or BigQuery and you need it production-grade today, talk to us.
AI is a growing surface, not a finished product. The compile-validate loop (generate → type-check → auto-fix → land) is real and shipped; the broader story (mass refactor across the DAG, auto-migration from a column type change, schema-aware assertion generation) is on the roadmap, not the changelog.
Iceberg. REST-catalog source discovery is Beta. Content-addressed writes round-trip as Iceberg through Delta UniForm — shipped end-to-end (Wave 2). First-class Iceberg-native writes without the Delta intermediate are on the 2026 roadmap.
No built-in semantic layer. Rocky's typed IR is the right home for one. Today, integrate with Cube, the dbt Semantic Layer, or your existing metric store.
Orchestration: Dagster is first-class. A rocky serve standalone path exists; native Airflow / Prefect integrations are not yet shipped — they're called from the CLI like any other binary.

If those gaps are blockers for your team, open a discussion — the roadmap is shaped by where production pipelines are actually getting hurt.

Subprojects

Path	Artifact	Language	Description
`engine/`	`rocky` CLI binary	Rust	Core SQL transformation engine — 22-crate Cargo workspace
`integrations/dagster/`	`dagster-rocky` PyPI wheel	Python	Dagster resource and component wrapping the Rocky CLI
`editors/vscode/`	Rocky VSIX	TypeScript	VS Code extension — LSP client + commands for AI features
`examples/playground/`	(config only)	TOML / SQL	Self-contained DuckDB sample pipeline used for smoke tests and benchmarks

Each subproject has its own README with detailed usage. The engine/README.md is the canonical product reference for the Rocky CLI.

Adapters

Role	Adapter	Status	Notes
Warehouse	Databricks	Production	SQL Statement API · Unity Catalog · `SHALLOW CLONE` for branches
Warehouse	Snowflake	Beta	REST connector · zero-copy `CLONE` for branches · masking policies
Warehouse	BigQuery	Beta	REST connector · `CREATE TABLE … COPY` for branches
Warehouse	DuckDB	Local / Testing	Embedded · powers `rocky playground` (no credentials needed)
Warehouse	Trino	Beta	REST `/v1/statement` polling client · Basic + JWT auth · Docker conformance harness behind `trino-conformance` feature
Source	Fivetran	Production	REST connector + table discovery
Source	Airbyte	Beta	Catalog discovery
Source	Iceberg	Beta	REST catalog discovery of namespaces and tables
Source	Manual	Production	Schema/table lists inline in `rocky.toml`

Building a warehouse Rocky doesn't ship in-tree (ClickHouse, Redshift, …)? See the Adapter SDK guide and the Rust-native skeleton POC.

Building from source

git clone https://github.com/rocky-data/rocky.git
cd rocky
just build       # builds engine + dagster wheel + vscode extension
just test        # runs all test suites
just lint        # cargo clippy/fmt + ruff + eslint

just is optional — you can also build each subproject directly. See CONTRIBUTING.md for per-subproject build commands.

Releases

Each artifact is released independently using a tag-namespaced scheme:

engine-v* → Rocky CLI binary (cross-compiled, on GitHub Releases)
dagster-v* → dagster-rocky wheel
vscode-v* → Rocky VSIX

See CONTRIBUTING.md for the full release flow.

Documentation

Full documentation: rocky-data.dev — concepts, guides, CLI reference, Dagster integration, adapter SDK.

Contributing

See CONTRIBUTING.md. Before opening a PR, please read the cross-project change guidance — schema and DSL changes must update consumers atomically.

Sponsoring

Rocky is free and open source. If it saves your team time, consider sponsoring the project so development can continue.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 681 Commits
.claude/skills		.claude/skills
.git-hooks		.git-hooks
.github		.github
docs		docs
editors/vscode		editors/vscode
engine		engine
examples/playground		examples/playground
integrations/dagster		integrations/dagster
schemas		schemas
scripts		scripts
.gitignore		.gitignore
.taplo.toml		.taplo.toml
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE_SMOKE.md		RELEASE_SMOKE.md
SECURITY.md		SECURITY.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The disasters Rocky prevents

Try it in 60 seconds

Who Rocky is for

See it in action

Detects schema drift the moment it happens

Enforces data contracts at compile time

Named branches for risk-free experiments

Column-level lineage, not table-level

AI model generation with a compile-validate loop

PR-time blast-radius with `rocky lineage-diff`

Classify columns, mask by environment, gate CI

Incremental loads with persistent watermark state

Where Rocky is today

Subprojects

Adapters

Building from source

Releases

Documentation

Contributing

Sponsoring

License

About

Uh oh!

Releases 149

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The disasters Rocky prevents

Try it in 60 seconds

Who Rocky is for

See it in action

Detects schema drift the moment it happens

Enforces data contracts at compile time

Named branches for risk-free experiments

Column-level lineage, not table-level

AI model generation with a compile-validate loop

PR-time blast-radius with rocky lineage-diff

Classify columns, mask by environment, gate CI

Incremental loads with persistent watermark state

Where Rocky is today

Subprojects

Adapters

Building from source

Releases

Documentation

Contributing

Sponsoring

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 149

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

PR-time blast-radius with `rocky lineage-diff`

Packages