Releases · Blosc/python-blosc2

19 May 17:38

v4.3.1

29a2da1

Release 4.3.1 Latest

Latest

Changes from 4.3.0 to 4.3.1

This is a maintenance release focused on CTable nested-column ergonomics,
grouped reductions, and API/documentation polish.

CTable nested columns and grouped reductions

Nested column names in group_by() results: grouped output columns can now
preserve dotted/nested names such as trip.sec instead of requiring valid
Python identifiers.
Column-object selectors: CTable.group_by() and CTable.sort_by() now
accept Column objects as well as string names, enabling idioms such as
t.group_by(t.trip.sec) and t.sort_by(t.trip.sec).
Grouped arg reductions: CTableGroupBy now supports argmin() and
argmax(), plus agg({"col": "argmin"}) / agg({"col": "argmax"}).
Results are logical row positions in the grouped table or view; groups with no
non-null values return -1.

NDArray constructor ergonomics

blosc2.array(): added a NumPy-like constructor for NDArrays. It mirrors
blosc2.asarray() but defaults to copy=True, so passing an existing
NDArray creates a copy unless copy=False or copy=None is requested.

Documentation

Expanded the CTable reference with RowTransformer, Column.row_transformer,
and CTableGroupBy.argmin / argmax documentation.
Added blosc2.ndarray(), blosc2.dictionary(), and related public schema
factory functions to the Schema Specs reference.
Moved blosc2.group_reduce() into the Reduction Functions reference and
updated its example to use Blosc2 NDArrays.

Assets 2

18 May 16:55

FrancescAlted

v4.3.0

a682951

Release 4.3.0

Changes from 4.2.0 to 4.3.0

CTable: N-dimensional (ndarray) columns

Multidimensional columns: CTable columns can now hold NDArray-backed cells, allowing
each row of a column to contain a full n-dimensional compressed array. This enables
use cases such as embedding vectors, image patches, time-series windows, or any other
multidimensional per-row payload.
CSV and DataFrame import/export: Multidimensional column data can be imported and
exported via CSV and pandas DataFrames, with automatic detection of array-valued cells.
Nullable ndarray columns: Multidimensional columns fully support the nullable
semantics (null_count, sentinel handling, null_policy) already available for scalar
columns.
from_pandas() improvements: CTable.from_pandas() now creates the correct
specialized backing storage for DictionarySpec, ListSpec, VLStringSpec,
VLBytesSpec, and other variable-length scalar specifications.
Improved schema coverage: New CTable timestamp schema type and extended
Column.info output with shape, chunks, and blocks descriptors.
Arg reductions: Added argmin() and argmax() for scalar and ndarray
CTable columns, plus row-transformer support for generated columns such as
per-row peak-hour or dominant-embedding-dimension features.

CTable: Group-by and filtered aggregation

CTable.group_by(): The primary group-by interface. Call
t.group_by("city", sort=True).agg({"qty": "mean"}) to produce a new
:class:CTable with aggregated results. Single-key and multi-key groupings are
supported, along with convenience methods such as .size(), .count(),
.sum(), .mean(), .min() and .max():

.. code-block:: python
```
by_city = t.group_by("city", sort=True)
by_city.size()  # COUNT(*)
by_city.sum("sales")  # SUM(sales) per city
by_city.agg({"sales": ["sum", "mean"]})  # SUM(sales), AVG(sales) per city
```
Performance accelerators: Dedicated Cython fast paths deliver significant speedups:
~25× for float32/64 group-by keys, ~8× for integer and dictionary-code keys, and a
general-purpose hash table for arbitrary float keys.
Filtered aggregate pushdown: The where= parameter is now accepted in aggregation
methods, pushing the filter into the compute engine so that only matching rows are
read and reduced.
Persistent grouped output: Group-by results can be saved directly to persistent
storage via the urlpath= parameter.
blosc2.group_reduce(): New public function that performs group-by reduction over
NDArray instances and CTable columns, with Cython-accelerated backends for common
key/reduction combinations.

CTable: Dictionary / categorical columns

DictionarySpec column type: Introduced a new dictionary-encoded (categorical)
column type that stores string or integer codes mapped to a shared dictionary, providing
compact storage and accelerated equality and membership queries.
Dictionary types in where clauses: Dictionary columns can be queried with the same
where= expression syntax as other column types, including nested dotted-name access.
Improved display: CTable printing now adapts to the terminal width, and dictionary
values are shown in their decoded form. Column.info has been extended with type
details, shape, chunks, and blocks.

CTable: Nested columns and field-name escaping

Dotted nested column access: Columns whose names contain literal .
(e.g., "root.nested") are now fully addressable via the dotted accessor syntax in
where expressions, __getitem__, and the public API.
Hierarchical _cols storage paths: The internal column storage layout now preserves
a hierarchical structure that mirrors the logical nesting, improving introspection
and interop.
Nested-field pipeline: A new flattened-storage pipeline with logical mapping
preserves nested schema structure (field names, types, and hierarchy) through
Arrow and Parquet import/export. For unnamed top-level list<struct<...>> Parquet
files, the logical schema round-trips faithfully, though the original physical row
grouping is intentionally not preserved.
Field-name escaping: Special characters (. and /) in column names are
automatically escaped during schema construction and metadata round-trips.

Parquet import/export improvements

Arrow serializer by default: CTable.from_parquet() now defaults to the Arrow
serializer, providing better schema fidelity and nested-type support.
Progress reporting: A --progress flag and an ETA estimator have been added to
the parquet-to-blosc2 CLI for long-running imports.
--max-rows parameter: CTable.from_parquet() and the CLI now accept max_rows
to limit the number of imported rows.
--timestamp-unit: New CLI option to control timestamp unit conversion on import.
--float-trunc-prec: New CLI option to truncate floating-point precision on import.
Separated nested columns enabled by default: The separate_nested_cols flag is now
True by default for both the Python API and the CLI, ensuring nested Arrow structs
are always expanded into flat columns.
list_serializer parameter: New option to control how list-type columns are
serialized, with sensible defaults for different list layouts.
Validation optimizations: Arrow datetime values are validated only during import,
reducing runtime overhead on subsequent operations.

TreeStore: Inline CTable support

CTables inside TreeStore: CTable objects can now be stored inline as items
inside a TreeStore, enabling hierarchical storage that mixes arrays and tables in a
single persistent container.
Cache hardening: TreeStore cache assignments now use defensive copies and cache
effective object roots to avoid aliasing and stale-cache errors.
Examples and tutorials: New tutorials and docstring examples demonstrate how to
store, retrieve, and query CTables within a TreeStore.

Performance and usability enhancements

Faster open and import: blosc2.open() and store constructors now assume valid
file extensions and defer column metainfo loading, making CTable.open() and
package import noticeably faster.
CTable.nrows is now lazy: The row count is computed on demand rather than eagerly,
speeding up open and schema-inspection workflows.
Accelerated scalar and small-slice access: The batch/list path for reading scalar
values or small column slices has been overhauled, eliminating internal placeholder
materialization and yielding lower latency.
Late-import optimizations: Heavy optional dependencies are imported lazily at the
blosc2 package level, reducing the baseline import blosc2 overhead.
iter_arrow_batches() optimization: Avoids full Python object materialization of
batches during iteration, reducing memory pressure.
NDArray-to-list conversion: Small optimization when converting NDArray objects
to Python lists.
_last_pos invalidation skipped: Mid-table deletes no longer eagerly invalidate
cached positional state, improving delete latency.

Documentation, examples and benchmarks

API reference expanded: blosc2.group_reduce() has been added to the Sphinx
reference, along with updated CTable, Column, and TreeStore pages.
New tutorials and examples: Added sections on CTable–TreeStore integration,
nested fields, dictionary columns, aggregates, grouping and querying with where=.
New benchmarks: Graph benchmarks for CTable insert time, column count, memory usage,
and where= queries, plus dedicated group-by, nested-filter, and Parquet round-trip
benchmarks.

Fixes and compatibility

Null and NaN handling: NumPy scalar null sentinels are now normalized to plain Python
scalars, and floating-point NaN sentinels are treated consistently with Python
float('nan').
Empty aggregate results: Filtered aggregations that produce no rows now handle the
empty result gracefully.
Generated column safety: Accessing a stalled (unfillable) generated column now raises
a clear exception instead of producing undefined results.
Miniexpr bundling: Miniexpr’s bundled libtcc and related runtime files are now
kept inside the blosc2 package, avoiding conflicts with other TCC installations.
Test improvements: Torch-dependent tests are marked as heavy, PyArrow-optional
tests are skipped when the library is absent, and parametrization matrices have been
trimmed to reduce CI time.
Missing Cython validation: Added validation guards for several Cython extension
functions that previously lacked explicit error checking.
C-Blosc2 update: Bundled C-Blosc2 has been updated to the latest version (3.0.3).
blosc2.open() default mode changed from 'a' to 'r': Removed the FutureWarning that
was added to prepare for this transition.

Assets 2

07 May 11:38

FrancescAlted

v4.2.0

81a9c09

Release 4.2.0

Changes from 4.1.2 to 4.2.0

CTable: columnar compressed tables

Introduced blosc2.CTable, a new columnar table container for compressed, typed columns. CTables support dataclass- and schema-based construction, row iteration, column access, table views, head() / tail() / sample(), sorting, selection and compact where expressions.
Added persistent CTables backed by TreeStore, with support for blosc2.open(), CTable.open(), CTable.load(), CTable.save(), CTable.to_b2d() and CTable.to_b2z(). CTable views can be saved too, and .b2z/.b2d path handling has been tightened.
Added mutation operations for CTables, including append(), extend(), delete(), compact(), add_column(), drop_column(), rename_column() and related schema validation.
Added computed columns, including virtual computed columns backed by lazy expressions, materialized computed columns and automatic filling of materialized computed columns during inserts.
Added CTable indexing support, including persistent indexes, direct expression indexes, ordered index reuse, boolean LazyExpr/NDArray masks in CTable.__getitem__, iter_sorted() and indexing support for .b2z tables.
Added nullable schema support and null policies for CTable scalar columns, preserving nullable scalar Parquet round-trips.
Added variable-length CTable column support via ListArray / ObjectArray, including vlstring and vlbytes schema specs, fixed-length string/bytes import support and list/struct Arrow/Parquet round-trips.
Added Arrow, Parquet and CSV interoperability for CTables, including batch-wise Arrow/Parquet import/export, Arrow schema metadata preservation, CTable.from_arrow_batches() improvements and a new parquet-to-blosc2 CLI utility.
Added CTable documentation, tutorials, examples and benchmarks covering schema definition, persistence, querying, indexing, mutations, nullable columns, computed columns and variable-length columns.

Indexing and ordering

Added a new indexing subsystem for NDArrays and CTables, including full, partial/bucket, light/medium and OPSI-style index kinds, out-of-core index builders and sidecar storage.
Added blosc2.Index as the unified public index handle, plus APIs such as create_index(), compact_index(), iter_sorted(), will_use_index() and related query explanation support.
Added materialized expression indexes for NDArrays and direct expression indexes for CTables.
Added persistent query-result caching for indexed lookups, with FIFO pruning and cache accounting.
Added blosc2.argsort() and refactored indexing APIs around explicit index enums and sorting helpers.
Improved indexed query performance with Cython accelerators, threaded chunk batching, zero-copy/cached mmap reads, chunk-aware and reduced-order layouts and faster scattered row gathering.
Reduced memory usage during index creation and lookup by avoiding full sidecar materialization, replacing memmap staging with Blosc2 scratch arrays and adding tmpdir support for full out-of-core indexes.

Persistence, stores and serialization

Added structured Blosc2 serialization based on b2object carriers, including persisted C2Array, LazyExpr and DSL LazyUDF objects.
Added blosc2.Ref for serializing external references, plus examples for b2object bundles and persisted expressions/UDFs.
Added blosc2.load() as a convenience loader.
Added vlmeta support to LazyArray objects.
Improved store handling by preserving lazy b2object carriers in DictStore, allowing reopened proxies to refill caches after read-only opens, relaxing DictStore/TreeStore suffix requirements and adding DictStore.to_b2d().
Accelerated blosc2.open() by trying standard opens first and warning on implicit append mode.

Arrays, computation and containers

Added ObjectArray for fully general object data and renamed the earlier VLArray work accordingly; added ListArray docstrings and Arrow integration improvements.
Added schema helpers including numeric specs, blosc2.struct() and blosc2.object() for nested/fully general column declarations.
Improved fromiter() with direct chunked construction and substantially lower peak memory use.
Improved asarray() behavior for NDArray inputs when copy-inducing keyword arguments are supplied.
Added SChunk.reorder_offsets().
Improved BatchArray defaults and documentation; the default compression level is now tuned for faster lookup/scan behavior.
Continued matmul/linalg optimization work and shared-thread-pool integration.

CLI, docs and examples

Added the parquet-to-blosc2 command with options such as --max-rows, --parquet-batch-size, --blosc2-items-per-block and --use-dict.
Added new CTable, ObjectArray, BatchArray, containers, indexing and serialization tutorials and examples.
Reorganized and expanded the API reference for CTable, Column, schema specs, Index, save/load helpers and miscellaneous APIs.
Updated benchmark suites for CTables, indexing, Parquet import/export, BatchArray and NDArray construction/indexing.

Fixes and compatibility

Updated bundled C-Blosc2 to v3.0.2 and require C-Blosc2 >= 3.0.0 when building against a system library.
Updated bundled C-Blosc2 and miniexpr sources multiple times.
Restored compatibility with NumPy < 2.
Fixed Windows and mmap/file-locking issues in index creation, rebuilds and temporary file cleanup.
Fixed full-index query failures for large CTable columns and full out-of-core merge failures on systems with small /tmp.
Fixed stale sidecar/cache reuse and targeted cache invalidation when persistent sidecars are replaced.
Fixed .b2z double-open corruption caused by GC-triggered repacking and made temporary .b2z unpacking default to the source file directory.
Fixed a regression when reopening persisted proxies in read-only mode.
Fixed GC-induced thread hangs on macOS with Python 3.14 and hardened async chunk reading/cache cleanup paths.
Fixed lazy-chunk source-size handling in decode/getitem callers.
Fixed nullable validation, dictionary extend validation, CTable close propagation, print alignment and NumPy mask support.
Fixed arange() regressions and several pre-existing set_slice error-handling issues.
Clamped indexing/thread defaults for wasm32.

Assets 2

03 Mar 11:09

lshaw8317

v4.1.2

0fc782e

Blosc2 v4.1.2

Updated c-blosc2 for memory leak and other bug fixes

Assets 2

02 Mar 15:03

lshaw8317

v4.1.1

58e4515

Blosc2 v4.1.1

Update miniexpr version to fix bug on Ubuntu-arm64.

Assets 2

28 Feb 07:13

lshaw8317

v4.1.0

c275744

Blosc2 v4.1.0

Add DSL kernel functionality for faster, compiled, user-defined functions which broadly respect python syntax and implement the LazyArray interface. See the introductory tutorial at: https://blosc.org/python-blosc2/getting_started/tutorials/03.lazyarray-udf-kernels.html
Add read-only mmap support for store containers:
DictStore, TreeStore, and EmbedStore now accept mmap_mode="r"
when opened with mode="r" (including via blosc2.open for .b2d,
.b2z, and .b2e).
New .meta entry for store containers, allowing better store recognition at blosc2.open() time. Fixes #546.
Add cumulative_sum and cumulative_prod functions for Array API compliance.
Add Unicode string arrays, support comparison operations with them, and optimised compression path.
Add endswith and startswith and extend contains to support strings and offer miniexpr multithreaded computation when possible.
Use DSL kernels to accelerate arange/linspace constructors by 6-10x.
Improve documentation for filters and filters_meta.
Fix edge case issues with resize and constructors so that chunks may be set independently of shape, and arrays may be extended from empty consistently.
Continued work on miniexpr integration, interface, and support.
Ruff fixes and implementation of PEP recommendations.

Assets 2

29 Jan 14:18

lshaw8317

v4.0.0

58cce0f

Blosc2 v4.0.0

What's Changed

The main change is hyperfast fully multithreaded computation with miniexpr (final PR * Miniexpr for Windows by @FrancescAlted in #565).
In addition, the internal wheel structure has been changed to implement PEP 427 (@lshaw8317 in #560). In addition:

feat: add support for .b2z, .b2d, .b2e files and update related tests by @bossbeagle1509 in #541
Add none indexing for lazyudf/lazyarray by @lshaw8317 in #545
Respect NUMEXPR_MAX_THREADS when setting numexpr thread count by @skmendez in #567
Add openzl_plugin support by @lshaw8317 in #559

Full Changelog: v3.12.2...v4.0.0

Contributors

FrancescAlted, skmendez, and 2 other contributors

Assets 2

22 Jan 15:43

lshaw8317

v4.0.0-b1

a21b920

Blosc2 v4.0.0-b1 Pre-release

Pre-release

This is a beta version with hyperfast multithreaded expression calculatio via the incorporation of miniexpr; as well as better support for plugins (stay tuned for blosc2_openzl plugin!),

What's Changed

Update pre-commit hooks by @pre-commit-ci[bot] in #537
Fix fancy index item bug by @ykcUconn in #543
feat: add support for .b2z, .b2d, .b2e files and update related tests by @bossbeagle1509 in #541
Add none indexing for lazyudf/lazyarray by @lshaw8317 in #545
Bump actions/download-artifact from 6 to 7 by @dependabot[bot] in #547
Bump actions/upload-artifact from 5 to 6 by @dependabot[bot] in #548
Update pre-commit hooks by @pre-commit-ci[bot] in #550
PEP 639 compliance by @DimitriPapadopoulos in #552
Multi-threaded reductions by @FrancescAlted in #549
Implement PEP recommendations by @lshaw8317 in #560
Add openzl_plugin support by @lshaw8317 in #559

New Contributors

@ykcUconn made their first contribution in #543
@bossbeagle1509 made their first contribution in #541

Full Changelog: v3.12.2...v4.0.0-b1

Contributors

FrancescAlted, DimitriPapadopoulos, and 5 other contributors

Assets 2

04 Dec 11:46

lshaw8317

v3.12.2

f855c2f

Blosc2 v3.12.2

What's Changed

Hotfix to change WASM wheel hosting to separate repo

Assets 2

03 Dec 17:10

lshaw8317

v3.12.1

1e38896

Blosc2 v3.12.1

What's Changed

Allow saving of numba-decorated lazyudfs by @lshaw8317 in #538
Automate upload of WASM wheels to GitHub pages

Contributors

lshaw8317

Assets 2

Uh oh!

Releases: Blosc/python-blosc2

Release 4.3.1

Changes from 4.3.0 to 4.3.1

CTable nested columns and grouped reductions

NDArray constructor ergonomics

Documentation

Uh oh!

Release 4.3.0

Changes from 4.2.0 to 4.3.0

CTable: N-dimensional (ndarray) columns

CTable: Group-by and filtered aggregation

CTable: Dictionary / categorical columns

CTable: Nested columns and field-name escaping

Parquet import/export improvements

TreeStore: Inline CTable support

Performance and usability enhancements

Documentation, examples and benchmarks

Fixes and compatibility

Uh oh!

Release 4.2.0

Changes from 4.1.2 to 4.2.0

CTable: columnar compressed tables

Indexing and ordering

Persistence, stores and serialization

Arrays, computation and containers

CLI, docs and examples

Fixes and compatibility

Uh oh!

Blosc2 v4.1.2

Uh oh!

Blosc2 v4.1.1

Uh oh!

Blosc2 v4.1.0

Uh oh!

Blosc2 v4.0.0

What's Changed

Contributors

Uh oh!

Blosc2 v4.0.0-b1

What's Changed

New Contributors

Contributors

Uh oh!

Blosc2 v3.12.2

What's Changed

Uh oh!

Blosc2 v3.12.1

What's Changed

Contributors

Uh oh!