Apache Arrow

Apache Arrow 24.0.0 Release

2026-04-21T00:00:00-04:00

The Apache Arrow team is pleased to announce the 24.0.0 release. This release covers over 3 months of development work and includes 259 resolved issues on 325 distinct commits from 57 distinct contributors. See the Install Page to learn how to get the libraries for your platform.

The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog.

Community

We recently published our Community Highlights for 2025, check those out.

Thanks everyone for your contributions and participation in the project!

Format Notes

We have written a project-wide Security Model outlining what users should expect when dealing with Arrow data, especially coming from untrusted sources GH-48868.

Arrow Flight RPC Notes

The ODBC driver is still a work-in-progress. The driver now builds on Linux, but currently no builds are distributed (for any platform) (GH-49463).

In C++, we have refactored serialization/deserialization to make low-level functionality accessible for advanced usage (GH-49548).

C++ Notes

In addition to the aforementioned project-wide Security Model, we have written a specific Security Model for Arrow C++ covering more concrete topics such as API usage and parameter validity GH-49274.

Compute

Extension Types

The canonical type VariableShapeTensor was finally implemented GH-38007.

Parquet

Breaking change: The Arrow extension type name for Parquet Variant columns used to be parquet.variant but has been changed to arrow.parquet.variant GH-49081.

While Parquet C++ could only read unencrypted bloom filters, it now supports reading encrypted bloom filters as well GH-48334. In addition, it can also write bloom filters, though only unencrypted GH-34785.

An ambitious rewrite of the bit-unpacking utilities and optimizations has led to significant performance improvements on reading some Parquet columns, up to 50% faster in some cases GH-48277. This rewrite is described in more detail in an accompanying blog post.

The performance of reading DELTA_BINARY_PACKED-encoded integers has been improved in some favorable cases GH-49266.

Miscellaneous C++ changes

We have migrated to C++20 std::span, removing our home-grown implementation in arrow::util::span GH-48588.

A bunch of previously deprecated APIs have been removed GH-49356.

Linux Packaging Notes

Added support for Ubuntu 26.04, the next LTS GH-49341

MATLAB Notes

No major notes for this release on MATLAB.

Python Notes

Compatibility notes

pyarrow.gandiva is deprecated and will be removed in a future version GH-49227

New features

Type annotations work is starting to be included (GH-49102 and GH-49452)
Basic arithmetic on arrays and scalars is now supported GH-32007
Options to control writing of Parquet Bloom filters are added to parquet.write_table GH-49376
OpenTelemetry is enabled in PyArrow wheels GH-49382
AzureFileSystem is now included in the Windows wheels GH-44655

Other improvements

Scikit-build-core is now used as the PyArrow build system GH-36411
UUID objects are now inferred automatically in pa.scalar() and pa.array() without the need to specify the type explicitly GH-48241
Constructing an extension array via pa.array() from a list of extension-type scalars is now supported GH-48470
There have been some improvements in the documentation (GH-49278, GH-49269 and GH-28859)
CSV and JSON options have improved repr/str methods GH-47389

Relevant bug fixes

SparseCOOTensor.__repr__ missing f-string prefix is now fixed GH-49108
Pickling SubTreeFileSystem(base_path, AzureFileSystem(...)) is fixed GH-49078
Casting from StringArray to pandas 3.* when element is None is fixed GH-49002
Dictionary key order is now preserved when inferring struct type GH-40053
Duplicate csv header when table batches start with empty is now fixed GH-36889

R Notes

New Features

A number of new dplyr bindings GH-49533, GH-49256, GH-49535 and GH-49534

Compatibility notes

Arrow no longer builds with GCS enabled on CRAN to avoid failures in their build systems. If you would like a full-featured build of Arrow, we recommend installing from R-universe; see the Using cloud storage article in the docs for more information. GH-49067

Relevant bug fixes

to_arrow() now retains grouping GH-40640

Ruby and C GLib Notes

Fixed GC related problems.
GArrowListArray: Added support for returning offset buffer.
GArrowLargeListArray: Added support for returning offset buffer.
GArrowUnionArray: Added support for returning fields.
Deprecated Feather features.

Ruby

We've added pure Ruby Apache Arrow writer implementation to the red-arrow-format gem.

We've marked pure Ruby Apache Arrow reader implementation in the red-arrow-formatgem as stable because it passes integration tests with other implementations. But it still has some missing features.

The red-arrow gem:

Add support for converting to raw Ruby objects of the following arrays:
- Arrow::LargeBinaryArray
- Arrow::LargeUTF8Array
- Arrow::LargeListArray
- Arrow::FixedSizeListArray
- Arrow::DurationArray
- Arrow::DictionaryArray with Arrow::LargeBinaryArray or Arrow::LargeUTF8Array

C GLib

No C GLib only notes.

Java, JavaScript, Go, .NET, Swift and Rust Notes

The Java, JavaScript, Go, .NET, Swift and Rust projects have moved to separate repositories outside the main Arrow monorepo.

For notes on the latest release of the Java implementation, see the latest Arrow Java changelog.
For notes on the latest release of the JavaScript implementation, see the latest Arrow JavaScript changelog.
For notes on the latest release of the Rust implementation see the latest Arrow Rust changelog.
For notes on the latest release of the Go implementation, see the latest Arrow Go changelog.
For notes on the latest release of the .NET implementation, see the latest Arrow .NET changelog.
For notes on the latest release of the Swift implementation, see the latest Arrow Swift changelog.

Apache Arrow ADBC 23 (Libraries) Release

2026-04-07T00:00:00-04:00

The Apache Arrow team is pleased to announce the version 23 release of the Apache Arrow ADBC libraries. This release includes 41 resolved issues from 20 distinct contributors.

This is a release of the libraries, which are at version 23. The API specification is versioned separately and is at version 1.1.0.

The subcomponents are versioned independently:

C/C++/GLib/Go/Python/Ruby: 1.11.0
C#: 0.23.0
Java: 0.23.0
R: 0.23.0
Rust: 0.23.0

The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog.

Release Highlights

A breaking change has been made to the Rust APIs (pre-1.0): returned RecordBatchReaders are now type-erased and boxed for caller flexibility; this also fixes the returned reader lifetime accidentally being tied to input argument lifetimes (#3904).

A driver manager for Node.js is now available from NPM (#4046, #4091, #4116, #4125, etc.).

The C++ and Rust driver managers now support connection profiles (#3876, #3973, #4080, #4083 etc.). (Note that other bindings that use the C++ driver manager, including GLib/Ruby, Go, Java, Python, R, and so on, inherit this support.)

The Go APIs have added interfaces that always take a context.Context for consistency, and to make sure context like telemetry traces propagate properly (#4009).

The Python driver manager has added specific parameters for using connection profiles as well (#4078, #4118). Also, non-string option values are directly accepted for convenience (#4088). adbc_get_statistics has been added (#4129).

The JNI bindings (allowing use of C/C++/Go/Rust/etc. drivers from Java) now support more functions (GetObjects, GetInfo, ExecuteSchema, etc.) (#3966, #3972, #4056).

Packages are now being uploaded to Homebrew (#4131).

Python wheels now require manylinux_2_28, up from manylinux2010, following PyArrow (#4146). On macOS, macOS 12 is now the minimum version due to upgrading to Go 1.25+ (including on conda-forge, where the packages previously pinned Go 1.24 to avoid this).

The PostgreSQL driver tries to reconcile Arrow NA arrays with PostgreSQL types when binding (#4098). Also, a bug in conversion from Arrow decimals to PostgreSQL numerics has been fixed (#3787).

The SQLite driver now enables various optional features, like math functions (#4147).

Contributors

$ git shortlog --perl-regexp --author='^((?!dependabot\[bot\]).*)$' -sn apache-arrow-adbc-22..apache-arrow-adbc-23
David Li
Kent Wu
Matt Topol
eitsupi
Bryce Mecum
Bruce Irschick
Mandukhai Alimaa
Emil Sadek
Tornike Gurgenidze
Dewey Dunnington
Felipe Oliveira Carvalho
eric-wang-1990
Curt Hagenlocher
Ian Cook
Madhavendra Rathore
Mila Page
Pavel Agafonov
Roshan Banisetti
davidhcoe
oglego

Roadmap

We are working on the next revision of the API standard, focusing on missing features (primarily metadata/catalog data). We welcome anyone interested in contributing. Current progress can be found in the 1.2.0 specification milestone.

Getting Involved

We welcome questions and contributions from all interested. Issues can be filed on GitHub, and questions can be directed to GitHub or the Arrow mailing lists.

Community Highlights 2025

2026-03-19T00:00:00-04:00

As you may have read in a previous blog post ¹, the Apache Arrow project recently turned 10 years old. We are grateful to everyone who helped us achieve this milestone, and we wanted to celebrate the community's accomplishments, by publishing our community highlights from 2025.

We were inspired by the research by Dr Cat Hicks et al ², who found that concrete evidence of progress and accomplishments is instrumental to motivation and collaboration in developer teams. We think the same should hold for open source.

New contributors

It has been great to see many new contributors joining the project in the past year, with over 300 such individuals observed across the main Apache Arrow language implementations.

Number of new contributors per repository.
Repository/Implementation	Number of new contributors
arrow	125
arrow-rs	132
arrow-java	28
arrow-go	35

Worth highlighting is alinaliBQ who has been very active on the C++ Flight SQL ODBC Driver work together with justing-bq.

AntoinePrv has done a huge amount of work on the C++ Parquet implementation and andishgar in the C++ Statistics area.

rmnskb got involved with PyArrow in EuroPython sprints and has contributed multiple PRs since then. On the same event paddyroddy also started with his first contribution and helped on the Python packaging side further on.

sdf-jkl, liamzwbao, friendlymatthew, and klion26 helped drive early Variant functionality in the Rust Parquet implementation and contributed a number of follow-up improvements.

jecsand838 drove major improvements to the Rust arrow-avro crate, work highlighted in the Introducing Arrow Avro blog post.

Notable New Contributors in apache/arrow for 2025.
Author	Number of prs	Number of line changes (+ and -)
alinaliBQ	36	15754
andishgar	19	2926
AntoinePrv	8	79257
rmnskb	7	550
justing-bq	4	12607

Notable New Contributors in apache/arrow-rs for 2025.
Author	Number of prs	Number of line changes (+ and -)
scovich	50	21006
jecsand838	38	26753
friendlymatthew	33	7203
sdf-jkl	4	388
rambleraptor	4	333

Notable New Contributors in apache/arrow-go for 2025.
Author	Number of prs	Number of line changes (+ and -)
Mandukhai-Alimaa	6	1392
hamilton-earthscope	5	2998

Release, Packaging and CI

A lot of work has been done around the Continuous Integration and Developer Tools area. Ensuring a project with the reach of Arrow is properly working requires validation on a huge matrix of operating systems, architectures, libraries, versions. Needless to say that maintenance work has tremendous importance for the health of the project and the positive contributor experience.

The most active contributors in the main repository are the ones contributing heavily on those areas while also providing the most review capacity. Shout out to kou and raulcd for taking such good care of the project and devoting countless hours so that everything runs smoothly.

Notable contributions worth mentioning are enhanced release automation and reproducible builds for sources, migrating remaining AppVeyor and Azure jobs to GitHub actions, improving dev experience with more pre-commit checks instead of custom made linting tools.

Moving some implementations out of the main repository (apache/arrow on GitHub) helped with easier releases and maintenance of the main repository and also of separate language implementations. The current apache/arrow repo now holds the format specification, C++ implementation together with all the bindings to it (Python, R, Ruby and C GLib). Other languages now live in their own apache/ repos namely apache/arrow-java, apache/arrow-js, apache/arrow-rs, apache/arrow-go, apache/arrow-nanoarrow, apache/arrow-dotnet and apache/arrow-swift.

Notable Contributors in apache/arrow for 2025.
Author	Number of prs	Number of line changes (+ and -)
kou	221	141015
AntoinePrv	8	79257
raulcd	110	46645
pitrou	101	36585
jbonofre	1	20061

Notable Components in apache/arrow for 2025.
Component label	Number of merged prs	Number of line changes (+ and -)
Parquet	100	103828
C++	387	82744
FlightRPC	43	52659
CI	237	42249
Ruby	74	20676

Migration of infrastructure from Voltron Data

As Voltron Data has wound down its operations in 2025, the Arrow project had to migrate benchmarking infrastructure and nightly report from Voltron-managed services to an Arrow-managed AWS account. This work has been driven by rok.

Closing of Stale issues

thisisnic was working on closing of stale issues in the apache/arrow repository which helped surfacing important issues that were overlooked or forgotten.

Code contributions

C++ implementation

Community support for maintenance and development of the Acero C++ is continuing with multiple bigger contributions in 2025 done by pitrou and zanmato1984.

Many kernels have been moved from the integrated compute module into a separate, optional package for improvement of modularity and distribution size when optional compute functionality is not being used. The work has been done by raulcd.

Arrow C++ Parquet implementation

There have been multiple contributions to fix and improve fuzzing support for Parquet. Fuzzing work is led by pitrou who is also one of the most active members of the community guiding other developers and supporting us with abundant review capacity.

Multiple newer types have also been supported in the last year, namely: VARIANT, UUID, GEOMETRY and GEOGRAPHY contributed by neilechao and paleolimbot.

An important feature added has also been Content-Defined Chunking which improves deduplication of Parquet files with mostly identical contents, by choosing data page boundaries based on actual contents rather than a number of values ³. This work has been done by kszucs.

There have been improvements in the Parquet encryption support for most of the releases in the last year. These efforts have been driven mostly by EnricoMi, pitrou, adamreeve and kapoisu.

PyArrow

A lot of work has been put into adding type annotations. It all started in July at EuroPython sprints and the code is now ready to be reviewed and merged. Some more review capacity will be needed to get this over the finish line. The work has been championed by rok.

Rust

Arrow Rust community invested heavily in the Rust parquet reader for which they created several blog posts ⁴, ⁵. The work has been championed by alamb and etseidl.

Notable Components in apache/arrow-rs for 2025.
component	merged_prs	line_changes
parquet	333	140958
arrow	436	76590
parquet-variant	125	41832
api-change	59	33938
arrow-avro	48	29487

Java

The biggest changes in apache/arrow-java for 2025 have been connected to Flight and Avro components plus Sphinx support due to the Java implementation being moved into a separate Apache repository. Contributors involved in the above are lidavidm and martin-traverse.

Go

There has been a lot of work related to new variant type in the Parquet implementation done in apache/arrow-go all by zeroshade.

Noticeable emphasis was also visible on performance-focused PRs leading to the addition of row seeking, bloom filter reading/writing, and reduction of allocations in the Parquet library along with significant optimization work in the compute.Take kernels. Shout out to pixelherodev and hamilton-earthscope for the emphasis they placed on improving performance.

Notable Components in apache/arrow-go for 2025.
component	merged_prs	line_changes
parquet	34	27056
arrow	33	14235

Nanoarrow

Bigger work in nanoarrow include Decimal32/64 and ListView/LargeListView support, LZ4 and ZSTD decompression in the IPC reader, and broader packaging via Conan, Homebrew, and vcpkg. Contributors driving most above are paleolimbot and WillAyd.

Arrow Summit 25

One last thing to highlight would be our first Arrow Summit 25 that was held in Paris in October 2025. The event was a great success and it brought users, contributors and maintainers together. It definitely was a highlight of the year for many of us. Thanks to raulcd and pitrou for organizing the event.

Thank you!

We would like to thank every single contributor to Apache Arrow for being a part of this great community and project! Hope this blog post helps to validate all the work you have done and motivates us to continue collaborating and growing together!

The Notebooks with the analysis for this blog post can be found in ⁶.

Note not all language implementations are mentioned. Some due to being moved into a separate repository in 2025 resulting in missing information for large amount of merged pull requests. Others due to having lower number of bigger contributions in the past year.

Apache Arrow Java 19.0.0 Release

2026-03-16T00:00:00-04:00

The Apache Arrow team is pleased to announce the v19.0.0 release of Apache Arrow Java.

Changelog

What's Changed

Breaking Changes

GH-774: Consoliate BitVectorHelper.getValidityBufferSize and BaseValueVector.getValidityBufferSizeFromCount by @rtadepalli in #775
GH-586: Override fixedSizeBinary method for UnionMapWriter by @axreldable in #885
GH-891: Add ExtensionTypeWriterFactory to TransferPair by @jhrotko in #892
GH-948: Use buffer indexing for UUID vector by @jhrotko in #949
GH-139: [Flight] Stop return null from MetadataAdapter.getAll(String) and getAllByte(String) by @axreldable in #1016

New Features and Enhancements

GH-52: Make RangeEqualsVisitor of RunEndEncodedVector more efficient by @ViggoC in #761
GH-765: Do not close/free imported BaseStruct objects by @pepijnve in #766
GH-79: Move splitAndTransferValidityBuffer to BaseValueVector by @rtadepalli in #777
GH-731: Avro adapter, output dictionary-encoded fields as enums by @martin-traverse in #779
GH-725: Added ExtensionReader by @xxlaykxx in #726
GH-882: Add support for loading native library from a user specified location by @pepijnve in #883
GH-109: Implement Vector Validators for StringView by @ViggoC in #886
GH-900: Fix gandiva groupId in arrow-bom by @XN137 in #901
GH-762: Implement VectorAppender for RunEndEncodedVector by @ViggoC in #884
GH-825: Add UUID canonical extension type by @jhrotko in #903
GH-110: Flight SQL JDBC related StringView components implementation by @ViggoC in #905
GH-863: [JDBC] Suppress benign exceptions from gRPC layer on ArrowFlightSqlClientHandler#close by @ennuite in #910
GH-929: Add UUID support in JDBC driver by @xborder in #930
GH-952: Add OAuth support by @xborder in #953
GH-946: Add Variant extension type support by @tmater in #947
GH-130: Fix AutoCloseables to work with @nullable structures by @axreldable in #1017
GH-1038: Trim object memory for ArrowBuf by @lriggs in #1044
GH-1061: Add codegen classifier jar for arrow-vector. by @lriggs in #1062
GH-301: [Vector] Allow adding a vector at the end of VectorSchemaRoot by @axreldable in #1013
GH-552: [Vector] Add absent methods to the UnionFixedSizeListWriter by @axreldable in #1052

Full Changelog: changelog

Apache Arrow Go 18.5.2 Release

2026-03-04T00:00:00-05:00

The Apache Arrow team is pleased to announce the v18.5.2 release of Apache Arrow Go. This patch release covers 16 commits from 6 distinct contributors.

Contributors

$ git shortlog -sn v18.5.1..v18.5.2
Matt Topol
daniel-adam-tfs
Evan Todd
Rusty Conover
Stas Spiridonov
William

Changelog

What's Changed

chore: bump parquet-testing submodule by @zeroshade in #633
fix(arrow/array): handle empty binary values correctly in BinaryBuilder by @zeroshade in #634
test(arrow/array): add test to binary builder by @zeroshade in #636
fix(parquet): decryption of V2 data pages by @daniel-adam-tfs in #596
perf(arrow): Reduce the amount of allocated objects by @spiridonov in #645
fix(parquet/file): regression with decompressing data by @zeroshade in #652
fix(arrow/compute): take on record/array with nested struct by @zeroshade in #653
fix(parquet/file): write large string values by @zeroshade in #655
ci: ensure extra GC cycle for flaky tests by @zeroshade in #661
fix(arrow/array): handle exponent notation for unmarshal int by @zeroshade in #662
fix(flight/flightsql/driver): fix time.Time params by @etodd in #666
fix(parquet): bss encoding and tests on big endian systems by @daniel-adam-tfs in #663
fix(parquet/pqarrow): selective column reading of complex map column by @zeroshade in #668
feat(arrow/ipc): support custom_metadata on RecordBatch messages by @rustyconover in #669
feat: Support setting IPC options in FlightSQL call options by @peasee in #674
chore(dev/release): embed hash of source tarball into email by @zeroshade in #675
chore(arrow): bump PkgVersion to 18.5.2 by @zeroshade in #676

New Contributors

@spiridonov made their first contribution in #645
@etodd made their first contribution in #666
@rustyconover made their first contribution in #669
@peasee made their first contribution in #674

Full Changelog: https://github.com/apache/arrow-go/compare/v18.5.1...v18.5.2

Apache Arrow nanoarrow 0.8.0 Release

2026-02-24T00:00:00-05:00

The Apache Arrow team is pleased to announce the 0.8.0 release of Apache Arrow nanoarrow. This release consists of 28 resolved GitHub issues from 10 contributors.

Release Highlights

Support for building String View arrays by buffer
LZ4 decompression support in IPC reader
Support for Conan
Support for Hombrew

See the Changelog for a detailed list of contributions to this release.

Features

String Views By Buffer

The C library in general supports two methods for producing or consuming arrays: most users use the builder pattern (e.g., ArrowArrayAppendString()); however, the "build by buffer" pattern can be effective when using nanoarrow with a higher level runtime like C++, Rust, Python, or R, all of which have mechanisms to build buffers already. The C library supports this with ArrowArraySetBuffer(); however, there was no way to reserve and/or set variadic buffers for string view arrays. In nanoarrow 0.8.0, the array builder API fully supports both mechanisms for building string view arrays.

LZ4 Decompression Support

The Arrow IPC reader included in the nanoarrow C library supports most features of the Arrow IPC format; however, decompression support for the LZ4 codec was missing which made the library and its bindings unusable for some common use cases. In 0.8.0, decompression for the LZ4 codec was added to the C library.

Users of the C library will need to configure CMake with -DNANOARROW_IPC_WITH_LZ4=ON and -DNANOARROW_IPC=ON to use CMake-resolved LZ4; however, client libraries can also use an existing ZSTD or LZ4 implementation using callbacks just like in 0.7.0.

nanoarrow on Conan

The nanoarrow C library can now be installed using the Conan C/C++ Package Manager! CMake projects can now use find_package(nanoarrow) when using a Conan-enabled toolchain after adding the nanoarrow dependency to conanfile.txt.

Thanks to @wgtmac for contributing the recipe!

nanoarrow on Homebrew

The nanoarrow C library can now be installed using Homebrew!

brew install nanoarrow

CMake projects can then use find_package(nanoarrow) when using Homebrew-provided cmake and allows other vcpkg ports to use nanoarrow as a dependency.

Thanks to @ankane for contributing the formula!

Contributors

This release consists of contributions from 12 contributors in addition to the invaluable advice and support of the Apache Arrow community.

$ git shortlog -sn apache-arrow-nanoarrow-0.8.0.dev..apache-arrow-nanoarrow-0.8.0-rc0
Dewey Dunnington
Bryce Mecum
Dirk Eddelbuettel
Even Rouault
Kevin Liu
Michael Chirico
Namit Kewat
Nyall Dawson
Sutou Kouhei
William Ayd

Apache Arrow 23.0.1 Release

2026-02-16T00:00:00-05:00

The Apache Arrow team is pleased to announce the 23.0.1 release. It includes a security fix for the C++ IPC file reader, so be sure to read the relevant details below to see if you are affected.

Apart from that, 23.0.1 is mostly a bugfix release that includes 28 resolved issues on 29 distinct commits from 12 distinct contributors.

See the Install Page to learn how to get the libraries for your platform.

The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog.

C++ notes

Fix possible OOB write in buffered IO (GH-48311).

IPC

CVE-2026-25087: Use After Free vulnerability in IPC file reader

Fix a security issue can be triggered when reading an Arrow IPC file (but not an IPC stream) with pre-buffering enabled, if the IPC file contains data with variadic buffers (such as Binary View and String View data).

Pre-buffering is disabled by default, so your code is vulnerable only if it enables it explicitly by calling RecordBatchFileReader::PreBufferMetadata. Affected Arrow C++ versions are 15.0.0 through 23.0.0. The fix integrated in 23.0.1 can also be separately viewed at GH-48925.

See our separate announcement for further detail.

Other fixes

Avoid memory blowup with excessive variadic buffer count in IPC (GH-48900).

Gandiva

Fix passing CPU attributes to LLVM (GH-48160).
Detect overflow in repeat() (GH-49159).

Parquet

Avoid re-serializing footer for signature verification (GH-48858).

Python notes

Added missing NOTICE.txt and LICENSE.txt to wheels (GH-48983).
Some fixes for compatibility with newer Cython versions like (GH-48965), (GH-49156) and (GH-49138).

Ruby notes

Fix a bug where Arrow::ExecutePlan nodes may be Garbage Collected (GH-48880).

R notes

Bump C++20 for R build infrastructure (GH-48817) and fix some C++ 20 related compilation issues (GH-48973).

Other modules and languages

No general changes were made to the other libraries or languages.

Apache Arrow is 10 years old 🎉

2026-02-12T00:00:00-05:00

The Apache Arrow project was officially established and had its first git commit on February 5th 2016, and we are therefore enthusiastic to announce its 10-year anniversary!

Looking back over these 10 years, the project has developed in many unforeseen ways and we believe to have delivered on our objective of providing agnostic, efficient, durable standards for the exchange of columnar data.

How it started

From the start, Arrow has been a joint effort between practitioners of various horizons looking to build common grounds to efficiently exchange columnar data between different libraries and systems. In this blog post, Julien Le Dem recalls how some of the founders of the Apache Parquet project participated in the early days of the Arrow design phase. The idea of Arrow as an in-memory format was meant to address the other half of the interoperability problem, the natural complement to Parquet as a persistent storage format.

Apache Arrow 0.1.0

The first Arrow release, numbered 0.1.0, was tagged on October 7th 2016. It already featured the main data types that are still the bread-and-butter of most Arrow datasets, as evidenced in this Flatbuffers declaration:

/// ----------------------------------------------------------------------
/// Top-level Type value, enabling extensible type-specific metadata. We can
/// add new logical types to Type without breaking backwards compatibility

union Type {
  Null,
  Int,
  FloatingPoint,
  Binary,
  Utf8,
  Bool,
  Decimal,
  Date,
  Time,
  Timestamp,
  Interval,
  List,
  Struct_,
  Union
}

The release announcement made the bold claim that "the metadata and physical data representation should be fairly stable as we have spent time finalizing the details". Does that promise hold? The short answer is: yes, almost! But let us analyse that in a bit more detail:

the Columnar format, for the most part, has only seen additions of new datatypes since 2016. One single breaking change occurred: Union types cannot have a top-level validity bitmap anymore.
the IPC format has seen several minor evolutions of its framing and metadata format; these evolutions are encoded in the MetadataVersion field which ensures that new readers can read data produced by old writers. The single breaking change is related to the same Union validity change mentioned above.

First cross-language integration tests

Arrow 0.1.0 had two implementations: C++ and Java, with bindings of the former to Python. There were also no integration tests to speak of, that is, no automated assessment that the two implementations were in sync (what could go wrong?).

Integration tests had to wait for November 2016 to be designed, and the first automated CI run probably occurred in December of the same year. Its results cannot be fetched anymore, so we can only assume the tests passed successfully. 🙂

From that moment, integration tests have grown to follow additions to the Arrow format, while ensuring that older data can still be read successfully. For example, the integration tests that are routinely checked against multiple implementations of Arrow have data files generated in 2019 by Arrow 0.14.1.

No breaking changes... almost

As mentioned above, at some point the Union type lost its top-level validity bitmap, breaking compatibility for the workloads that made use of this feature.

This change was proposed back in June 2020 and enacted shortly thereafter. It elicited no controversy and doesn't seem to have caused any significant discontent among users, signaling that the feature was probably not widely used (if at all).

Since then, there has been precisely zero breaking change in the Arrow Columnar and IPC formats.

Apache Arrow 1.0.0

We have been extremely cautious with version numbering and waited until July 2020 before finally switching away from 0.x version numbers. This was signalling to the world that Arrow had reached its "adult phase" of making formal compatibility promises, and that the Arrow formats were ready for wide consumption amongst the data ecosystem.

Apache Arrow, today

Describing the breadth of the Arrow ecosystem today would take a full-fledged article of its own, or perhaps even multiple Wikipedia pages. Our "powered by" page can give a small taste.

As for the Arrow project, we will merely refer you to our official documentation:

The various specifications that cater to multiple aspects of sharing Arrow data, such as in-process zero-copy sharing between producers and consumers that know nothing about each other, or executing database queries that efficiently return their results in the Arrow format.
The implementation status page that lists the implementations developed officially under the Apache Arrow umbrella (native software libraries for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust). But keep in mind that multiple third-party implementations exist in non-Apache projects, either open source or proprietary.

However, that is only a small part of the landscape. The Arrow project hosts several official subprojects, such as ADBC and nanoarrow. A notable success story is Apache DataFusion, which began as an Arrow subproject and later graduated to become an independent top-level project in the Apache Software Foundation, reflecting the maturity and impact of the technology.

Beyond these subprojects, many third-party efforts have adopted the Arrow formats for efficient interoperability. GeoArrow is an impressive example of how building on top of existing Arrow formats and implementations can enable groundbreaking efficiency improvements in a very non-trivial problem space.

It should also be noted that Arrow, as an in-memory columnar format, is often used hand in hand with Parquet for persistent storage; as a matter of fact, most official Parquet implementations are nowadays being developed within Arrow repositories (C++, Rust, Go).

Tomorrow

The Apache Arrow community is primarily driven by consensus, and the project does not have a formal roadmap. We will continue to welcome everyone who wishes to participate constructively. While the specifications are stable, they still welcome additions to cater for new use cases, as they have done in the past.

The Arrow implementations are actively maintained, gaining new features, bug fixes, and performance improvements. We encourage people to contribute to their implementation of choice, and to engage with us and the community.

Now and going forward, a large amount of Arrow-related progress is happening in the broader ecosystem of third-party tools and libraries. It is no longer possible for us to keep track of all the work being done in those areas, but we are proud to see that they are building on the same stable foundations that have been laid 10 years ago.

Introducing a Security Model for Arrow

2026-02-09T00:00:00-05:00

We are thrilled to announce the official publication of a Security Model for Apache Arrow.

The Arrow security model covers a core subset of the Arrow specifications: the Arrow Columnar Format, the Arrow C Data Interface and the Arrow IPC Format. It sets expectations and gives guidelines for handling data coming from untrusted sources.

The specifications covered by the Arrow security model are building blocks for all the other Arrow specifications, such as Flight and ADBC.

The ideas underlying the Arrow security model were informally shared between Arrow maintainers and have informed decisions for years, but they were left undocumented until now.

Implementation-specific security considerations, such as proper API usage and runtime safety guarantees, will later be covered in the documentation of the respective implementations.

Apache Arrow Go 18.5.1 Release

2026-01-26T00:00:00-05:00

The Apache Arrow team is pleased to announce the v18.5.1 release of Apache Arrow Go. This patch release covers 10 commits from 6 distinct contributors.

Contributors

$ git shortlog -sn v18.5.0..v18.5.1
Matt Topol
Alfonso Subiotto Marqués
Arnold Wakim
Bryce Mecum
Rok Mihevc
cai.zhang

Changelog

What's Changed

fix(internal): fix assertion on undefined behavior by @amoeba in #602
ci(benchmark): switch to new conbench instance by @rok in #593
fix(flight): make StreamChunksFromReader ctx aware and cancellation-safe by @arnoldwakim in #615
fix(parquet/variant): fix basic stringify by @zeroshade in #624
fix(parquet/pqarrow): fix partial struct panic by @zeroshade in #630
Flaky test fixes by @zeroshade in #629
ipc: clear variadicCounts in recordEncoder.reset() by @asubiotto in #631
fix(arrow/cdata): Handle errors to prevent panic by @xiaocai2333 in #614

New Contributors

@rok made their first contribution in #593
@asubiotto made their first contribution in #631
@xiaocai2333 made their first contribution in #614

Full Changelog: https://github.com/apache/arrow-go/compare/v18.5.0...v18.5.1