This page introduces the Apache Iceberg C++ library (iceberg-cpp): its purpose, key capabilities, major subsystems, library variants, and how it maps to the Apache Iceberg table format specification. For build instructions see page 2 for detailed architecture see page 3.1
iceberg-cpp is a C++ implementation of the Apache Iceberg table format. It provides the data structures, algorithms, and catalog integrations required to read, write, and manage Iceberg tables from C++ applications or engines.
The library is written in C++23, licensed under Apache License 2.0, and is part of the Apache Software Foundation.
Minimum requirements:
| Requirement | Version |
|---|---|
| CMake | 3.25+ |
| GCC | 14+ |
| Clang | 16+ |
| MSVC | 2022+ |
Sources: README.md29-33
Apache Iceberg defines a table format specification covering schemas, partition specs, sort orders, snapshots, manifest files, and catalog contracts. iceberg-cpp implements this specification in C++:
| Spec Concept | C++ Implementation |
|---|---|
| Table metadata | TableMetadata struct, TableMetadataBuilder class |
| Schema | Schema, SchemaField, Type hierarchy |
| Partition spec | PartitionSpec, PartitionField, Transform |
| Sort order | SortOrder, SortField |
| Snapshot | Snapshot, SnapshotRef, SnapshotSummaryBuilder |
| Manifest file / list | ManifestFile, ManifestEntry, ManifestListWriter, ManifestReader |
| Catalog | Catalog abstract interface |
| Table requirements | TableRequirement hierarchy |
| Table updates | TableUpdate, PendingUpdate hierarchy |
Sources: src/iceberg/type_fwd.h27-223
The build system produces up to three distinct library targets depending on CMake flags:
ICEBERG_BUILD_STATIC / ICEBERG_BUILD_SHARED → iceberg
ICEBERG_BUILD_REST → iceberg_rest
ICEBERG_BUILD_BUNDLE → iceberg_bundle
Library variant diagram:
Sources: src/iceberg/CMakeLists.txt124-166 src/iceberg/CMakeLists.txt180-215 src/iceberg/meson.build152-165
| Library | CMake Flag | Extra Dependencies | Description |
|---|---|---|---|
iceberg | (default) | nanoarrow, nlohmann_json, CRoaring, zlib | Core types, metadata, catalog interface, expressions, manifests |
iceberg_rest | ICEBERG_BUILD_REST=ON | cpr, OpenSSL, CURL | REST catalog client |
iceberg_bundle | ICEBERG_BUILD_BUNDLE=ON | Arrow, Parquet, Avro | Adds Avro/Parquet readers/writers and Arrow FileIO |
Sources: src/iceberg/CMakeLists.txt180-215 src/iceberg/meson.build149-154
The library follows a layered architecture. Each layer has distinct responsibilities and depends only on layers below it.
Layered architecture diagram:
Sources: src/iceberg/table.h38-175 src/iceberg/table.cc50-175 src/iceberg/type_fwd.h27-223 src/iceberg/CMakeLists.txt18-122
1. Client Interface Layer
The entry point for all table operations. Table (src/iceberg/table.h38-175 src/iceberg/table.cc50-175) provides factory methods for creating scans, transactions, and update operations. Catalog (src/iceberg/type_fwd.h193) manages table namespace operations and persistence.
2. Metadata Management Layer
Handles table state representation and evolution. TableMetadata (src/iceberg/type_fwd.h122) is the root structure containing all schemas, partition specs, snapshots, and properties. TableMetadataBuilder (src/iceberg/type_fwd.h201) provides a fluent API for constructing new metadata versions. TableUpdate (src/iceberg/type_fwd.h202) represents individual atomic changes.
3. Type System Layer
Defines data types, schemas, and partitioning. The Type hierarchy (src/iceberg/type_fwd.h35-81) includes primitive types and nested types. Transform (src/iceberg/type_fwd.h107) implements partitioning functions. PartitionSpec (src/iceberg/type_fwd.h93) defines partitioning layout.
4. Data Access Layer
Implements query planning and file I/O. TableScanBuilder (src/iceberg/type_fwd.h160) creates DataTableScan instances. FileScanTask (src/iceberg/type_fwd.h147) represents individual file read operations. FileIO (src/iceberg/type_fwd.h182) abstracts storage access.
5. Infrastructure Layer
Build system and dependency management. The CMake build (src/iceberg/CMakeLists.txt1-122) and Meson build (src/iceberg/meson.build1-165) manage the project lifecycle.
The transaction system enables atomic, multi-operation table modifications with optimistic concurrency control. Updates are managed through a shared TransactionContext (src/iceberg/transaction.h155-173) which tracks the current state via a TableMetadataBuilder.
Transaction workflow:
Sources: src/iceberg/table.cc171-230 src/iceberg/transaction.h39-173 src/iceberg/transaction.cc98-250 src/iceberg/update/pending_update.h42-92
Table factory methods create either auto-commit transactions or manual-commit transactions:
Table::NewUpdateSchema() (src/iceberg/table.h157) → auto-commit UpdateSchemaTable::NewTransaction() (src/iceberg/table.h141) → manual-commit TransactionTransaction tracks PendingUpdate instances (src/iceberg/update/pending_update.h42-92) and applies them to a TableMetadataBuilder. Concrete updates like FastAppend (src/iceberg/update/pending_update.h52) or UpdateProperties (src/iceberg/update/pending_update.h50) implement specific metadata transformations.
The expression subsystem enables predicate pushdown to minimize data read.
| Class | Purpose | File |
|---|---|---|
Expression | Base type for all expressions | src/iceberg/type_fwd.h:130 |
UnboundPredicate | Predicate before schema binding | src/iceberg/type_fwd.h:135 |
BoundPredicate | Predicate after schema binding | src/iceberg/type_fwd.h:127 |
InclusiveMetricsEvaluator | Filters files using min/max stats | src/iceberg/type_fwd.h:140 |
ManifestEvaluator | Filters manifest files | src/iceberg/type_fwd.h:141 |
ResidualEvaluator | Computes residual predicates | src/iceberg/type_fwd.h:142 |
Sources: src/iceberg/type_fwd.h127-143 src/iceberg/CMakeLists.txt30-44
Manifests track data files and enable snapshot isolation. Snapshots represent the state of a table at some point in time.
Reading and Writing Manifests:
ManifestListReader / ManifestListWriter (src/iceberg/type_fwd.h173-174)ManifestReader / ManifestWriter (src/iceberg/type_fwd.h175-176)ManifestGroup (src/iceberg/type_fwd.h172) — applies filters to manifestsSources: src/iceberg/CMakeLists.txt50-60 src/iceberg/type_fwd.h165-178
File I/O is abstracted through the FileIO interface (src/iceberg/type_fwd.h182).
| Implementation | Library | Description |
|---|---|---|
ArrowFileSystemFileIO | iceberg_bundle | Uses Arrow C++ FileSystem API |
Sources: src/iceberg/CMakeLists.txt182 src/iceberg/type_fwd.h179-185
Format Support:
AvroReader, AvroWriter (src/iceberg/CMakeLists.txt188-189)ParquetReader, ParquetWriter (src/iceberg/CMakeLists.txt194-197)| Directory | Library | Description |
|---|---|---|
src/iceberg/ | iceberg | Core types, metadata, expressions, manifests |
src/iceberg/catalog/ | iceberg | Catalog implementations (Memory, REST) |
src/iceberg/expression/ | iceberg | Predicates, evaluators, literals |
src/iceberg/manifest/ | iceberg | Manifest readers/writers |
src/iceberg/update/ | iceberg | PendingUpdate implementations |
src/iceberg/util/ | iceberg | UUID, decimal, hashing, temporal utilities |
src/iceberg/arrow/ | iceberg_bundle | Arrow FileIO and utilities |
src/iceberg/avro/ | iceberg_bundle | Avro reader/writer |
src/iceberg/parquet/ | iceberg_bundle | Parquet reader/writer |
Sources: src/iceberg/CMakeLists.txt20-122 src/iceberg/CMakeLists.txt170-178 src/iceberg/CMakeLists.txt181-197
Refresh this wiki