We've been busy in Q1 2026. 12 releases. 778 PRs. 95 contributors (thank you!). The streaming engine now covers more join types, all major formats have a streaming scan implementation, Delta and Iceberg both have full read/write support, and Polars Cloud gained a query profiler that helped us run a TPC-H benchmark query 54% faster at 64% lower cost. Read all the highlights in the latest Polars in Aggregate: https://lnkd.in/eQFm8D6E
Polars
Data Infrastructure and Analytics
Lightning-fast DataFrame library for Rust and Python
About us
Polars is a lightning fast DataFrame library/in-memory query engine. Its embarrassingly parallel execution, cache efficient algorithms and expressive API makes it perfect for efficient data wrangling, data pipelines, snappy APIs and so much more.
- Website
-
https://pola.rs
External link for Polars
- Industry
- Data Infrastructure and Analytics
- Company size
- 11-50 employees
- Headquarters
- Amsterdam
- Type
- Privately Held
Locations
-
Primary
Get directions
Amsterdam, NL
Employees at Polars
Updates
-
Polars loves sorted data! If your data is already sorted, you can get a performance boost up to 18x when joining your datasets. Read all about it in our latest blog post: https://lnkd.in/ezNzYxju
-
Realtime query profiling of Polars In this post we use the query profiler in Polars Cloud to optimize the infrastructure configuration for a specific query. This results in a 54% faster and 64% cheaper query with only five runs. Read all about it here: https://lnkd.in/eCnuGt2F
-
We've released Polars Cloud client 0.6.0. Some of the highlights: • Improved UX for query profiling Data skew is now included in the metrics, showing how long workers take to execute the stage and the size of partitions. You can now also see resource metrics per stage. • Compute Scratchpad Alpha We've released a new interactive scratchpad functionality for ad-hoc computation that runs on your Polars Cloud cluster. • Improved distributed query planning Various improvements in the distributed query planning to improve stability & performance. • Breaking: `LazyFrameRemote.execute` is now blocking by default Previously fire-and-forget, `.execute()` now blocks until the query completes. Providing the parameter `blocking=False` triggers the old behavior.
-
Quoting NVIDIA's Jensen Huang at GTC 2026: "All of these platforms are processing DataFrames. This is the ground truth of business. This is the ground truth of enterprise computing. Now we will have AI use structured data. And we are going to accelerate the living daylights out of it." Polars DataFrames are at the core of the AI revolution. https://lnkd.in/esMPmcmp
NVIDIA GTC Keynote 2026
https://www.youtube.com/
-
Polars reposted this
NVIDIA's GTC keynote; Polars is part of the $120B Structured Data Ecosystem powering AI. Quoting Jensen: "All of these platforms are processing DataFrames. This is the ground Truth of Business. This is the ground truth of Enterprise computing. Now we will have AI use Structured Data. And we are going to accelerate the living daylights out of it". Polars DataFrames are at the core of the AI revolution.
-
-
We've released Python Polars 1.39. Some of the highlights: • Streaming AsOf join join_asof() is now supported in the streaming engine, enabling memory-efficient time-series joins. • sink_iceberg() for writing to Iceberg tables A new LazyFrame sink that writes directly to Apache Iceberg tables. Combined with the existing scan_iceberg(), Polars now supports full read/write workflows for Iceberg-based data lakehouses. • Streaming cloud downloads scan_csv(), scan_ndjson(), and scan_lines() can now stream data directly from cloud storage instead of downloading the full file first. Link to the complete changelog: https://lnkd.in/d-7rUJJC
-
A one liner will route every .collect() call through the new streaming engine: pl.Config.set_engine_affinity("streaming") Put it at the top of your script and all subsequent .collect() calls will prefer the streaming engine. You can also pass engine="streaming" directly to a single .collect() call if you only want to opt in for only one query. The streaming engine processes data in chunks rather than loading everything into memory at once. It's 3-7x faster than the in-memory engine, and for workloads that exceed available RAM it's the only viable option. We will soon set the streaming engine as the default engine, but this way you can already enjoy its benefits.
-
-
pl.from_repr() constructs a DataFrame or Series directly from its printed string representation. This can be useful in unit tests: instead of rebuilding expected DataFrames through dictionaries with typecasting, the schema is encoded in the header and the values are right there in the table. You can see at a glance what the test is asserting.
-
-
Easily scale Polars queries from Apache Airflow. Our latest blog post walks through different patterns to run distributed Polars queries using Airflow: fire-and-forget execution, parallel queries, multi-stage pipelines, and manual cluster shutdowns. Read more here: https://lnkd.in/eU8tV7-3