Streaming Ingestion Lakehouse

LakeHouse with
Streaming and Search

Stream in/out, aggregate and (full-text) search with SQL. Fit for Agents. One binary, no external infrastructure.

~1s
Hot Commit Interval
Configurable, can go much faster
1-10m
Cold Tier (S3)
Durable storage
~5s
Cold→Hot Hydration
On-demand via triggers

Scalable Architecture

DuckLake with hot tier data inlining for ~1s query visibility. Horizontally scalable, multi-tenant, single binary. The boilstream extension vends temporary credentials so any DuckDB client works seamlessly.

BoilStream Architecture — Ingest, Transform, Aggregate, Search, Consume
INSTALL boilstream FROM community;
LOAD boilstream;

-- Login with email, password, and MFA code
PRAGMA boilstream_login('https://your-server.com/user@example.com', 'password', '123456');

-- List and use your ducklakes
FROM boilstream_ducklakes();
USE my_catalog;
SELECT * FROM events;

Ingest → Transform → Aggregate → Search → Consume

A complete streaming lakehouse — from ingestion to full-text search to real-time consumption, all with SQL.

🦆

DuckDB Extension

The boilstream extension manages DuckLake for you, vends temporary credentials for seamless hot + cold tier access. Secure OPAQUE PAKE authentication with MFA support.

K

Kafka Protocol + JIT Avro

Confluent Schema Registry compatible Avro with JIT-compiled decoder – 3-5x faster than Apache Arrow's Rust decoder published Oct 2025. Use standard Kafka clients to stream data.

S3

DuckLake Cold Storage

Automatic S3 Parquet snapshots with DuckLake catalog registration. Remote DuckDB clients with DuckLake extension work seamlessly.

🛡

Enterprise Ready

SSO with Entra ID (upload/download XML files) with automated user provisioning (SCIM), RBAC access control, audit trails, and user/admin dashboards. Built-in registration as an alternative. Prometheus monitoring. Configure multiple cloud backends and assign BoilStream roles for users.

🔍

SQL-Native Full-Text Search

Integrated Tantivy indexing with query pushdowns from SQL. Every ingested row is automatically indexed. Hot tier indexes searchable in seconds, cold tier bundles on S3. No Elasticsearch needed. Perfect for AI agents querying knowledge bases.

SQL

Materialized Views

Tumbling and sliding window aggregations over streaming data with DuckDB SQL. Output flows back through the full pipeline — hot tier, cold tier, CDC, and downstream views. Crash-safe with watermark persistence.

Streaming Views

Continuous row-by-row SQL transforms on every ingested row. Filter, project, and enrich data as it arrives. Each view gets its own derived topic. Chain views into pipelines — no external stream processor needed.

Real-Time SSE Consumer

Push Arrow IPC batches to browsers and services via Server-Sent Events. Automatic reconnection with catchup replay. Open source JS SDK for browser and Node.js. Feed AI agents and dashboards in real-time.

🔒

Multi-Tenant Isolation

Full tenant isolation within a single deployment. Separate DuckLakes, encrypted secrets, chrooted filesystems, and isolated sessions per user. RBAC with BoilStream roles.

Start querying in minutes

Deploy a single instance or scale horizontally with multiple nodes. Point to your S3 bucket and start ingesting.