Skip to main content

Welcome to OLake

olake

Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Visit olake.io for the full documentation, and benchmarks

Introduction to OLake

OLake is a high-performance, open-source platform for data extraction, loading, and Apache Iceberg table maintenance. It enables organizations to replicate data from operational systems into open lakehouse formats such as Apache Iceberg and Apache Parquet, while ensuring those tables remain optimized for analytics. By combining high-speed data ingestion, change data capture (CDC), and built-in Iceberg table maintenance, OLake helps teams build and maintain modern lakehouse architectures without relying on complex ETL pipelines or vendor-locked platforms.

GitHub Repository: https://github.com/datazip-inc/olake

OLake Ingestion (OLake Go) :

OLake Go is an open-source EL (Extract–Load) platform written in Go (programming language) for high performance and memory efficiency. It replicates data from operational databases and streaming systems directly into open lakehouse storage formats. Using Incremental Sync and Change Data Capture (CDC), OLake keeps destination tables continuously updated while minimizing infrastructure overhead. With OLake, organizations can:

  • Replicate data at scale from operational systems
  • Enable near real-time analytics on fresh data
  • Build open lakehouse architectures without vendor lock-in

To know more, read OLake Ingestion.

Supported Sources

OLake supports ingestion from several sources:

  • PostgreSQL — Full Refresh, Incremental Sync, and CDC using pgoutput
  • MySQL — Full Refresh, Incremental Sync, and binlog-based CDC
  • MongoDB — Full Refresh, Incremental Sync, and oplog-based CDC
  • Oracle Database — Full Refresh and Incremental Sync
  • Apache Kafka — Consumer group–based streaming ingestion
  • DB2 — Full Refresh & Incremental Sync
  • MSSQL — Full Refresh , Incremental Sync & CDC
  • S3 — Full Refresh & Incremental Sync

Destinations

OLake writes data to open lakehouse storage:

  • Parquet files on object storage such as Amazon S3, MinIO, and Google Cloud Storage

  • Apache Iceberg tables with support for multiple catalog integrations including:

    • AWS Glue Data Catalog
    • Apache Hive Metastore
    • REST catalogs such as Nessie, Polaris and Unity Catalog
    • JDBC catalogs

    To know more, read OLake Catalog Integration.

Iceberg Table Maintenance (OLake Fusion):

OLake includes built-in Iceberg table maintenance capabilities, allowing you to optimize and manage Apache Iceberg table. This helps maintain healthy tables, reduce storage inefficiencies, and ensure consistent query performance as data grows. With OLake, you can configure and run maintenance workflows to keep your Iceberg tables optimized and analytics-ready.

To know more, read OLake Optimization.

Query Engine Compatibility

Data written by OLake can be queried immediately using Iceberg-compatible engines such as:

  • Amazon Athena
  • Trino
  • Apache Spark
  • Apache Flink
  • Presto
  • Apache Hive
  • Snowflake

To know more, read OLake Query Engines Compatibility.

Know More About OLake

🚀 Curious how OLake performs? Check out our Benchmarks

🔍 Dive deeper into OLake Features, read here Features



💡 Join the OLake Community!

Got questions, ideas, or just want to connect with other data engineers?
👉 Join our Slack Community to get real-time support, share feedback, and shape the future of OLake together. 🚀

Your success with OLake is our priority. Don’t hesitate to contact us if you need any help or further clarification!