Welcome to OLake
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Visit olake.io for the full documentation, and benchmarks
Introduction to OLake
OLake is a high-performance, open-source platform for data extraction, loading, and Apache Iceberg table maintenance. It enables organizations to replicate data from operational systems into open lakehouse formats such as Apache Iceberg and Apache Parquet, while ensuring those tables remain optimized for analytics. By combining high-speed data ingestion, change data capture (CDC), and built-in Iceberg table maintenance, OLake helps teams build and maintain modern lakehouse architectures without relying on complex ETL pipelines or vendor-locked platforms.
GitHub Repository: https://github.com/datazip-inc/olake
OLake Ingestion (OLake Go) :
OLake Go is an open-source EL (Extract–Load) platform written in Go (programming language) for high performance and memory efficiency. It replicates data from operational databases and streaming systems directly into open lakehouse storage formats. Using Incremental Sync and Change Data Capture (CDC), OLake keeps destination tables continuously updated while minimizing infrastructure overhead. With OLake, organizations can:
- Replicate data at scale from operational systems
- Enable near real-time analytics on fresh data
- Build open lakehouse architectures without vendor lock-in
To know more, read OLake Ingestion.
Supported Sources
OLake supports ingestion from several sources:
- PostgreSQL — Full Refresh, Incremental Sync, and CDC using pgoutput
- MySQL — Full Refresh, Incremental Sync, and binlog-based CDC
- MongoDB — Full Refresh, Incremental Sync, and oplog-based CDC
- Oracle Database — Full Refresh and Incremental Sync
- Apache Kafka — Consumer group–based streaming ingestion
- DB2 — Full Refresh & Incremental Sync
- MSSQL — Full Refresh , Incremental Sync & CDC
- S3 — Full Refresh & Incremental Sync
Destinations
OLake writes data to open lakehouse storage:
-
Parquet files on object storage such as Amazon S3, MinIO, and Google Cloud Storage
-
Apache Iceberg tables with support for multiple catalog integrations including:
- AWS Glue Data Catalog
- Apache Hive Metastore
- REST catalogs such as Nessie, Polaris and Unity Catalog
- JDBC catalogs
To know more, read OLake Catalog Integration.
Iceberg Table Maintenance (OLake Fusion):
OLake includes built-in Iceberg table maintenance capabilities, allowing you to optimize and manage Apache Iceberg table. This helps maintain healthy tables, reduce storage inefficiencies, and ensure consistent query performance as data grows. With OLake, you can configure and run maintenance workflows to keep your Iceberg tables optimized and analytics-ready.
To know more, read OLake Optimization.
Query Engine Compatibility
Data written by OLake can be queried immediately using Iceberg-compatible engines such as:
- Amazon Athena
- Trino
- Apache Spark
- Apache Flink
- Presto
- Apache Hive
- Snowflake
To know more, read OLake Query Engines Compatibility.