Minimize concat memory usage #10866

dcherian · 2025-10-18T15:59:50Z

OK we were incredibly wasteful earlier!

| Change   | Before [b5e4b0e0] <main>   | After [c9432cfc] <min-concat-mem>   |   Ratio | Benchmark (Parameter)           |
|----------|----------------------------|-------------------------------------|---------|---------------------------------|
| -        | 4.82G                      | 920M                                |    0.19 | combine.Concat1d.peakmem_concat |
| -        | 574±20ms                   | 54.0±0.6ms                          |    0.09 | combine.Concat1d.time_concat    |

cc @mjwillson

Would be good to add a benchmark for the reindexing case at some point

Closes xr.concat has over 3x the peak memory usage and 5x slower than np.concatenate, even with large chunk sizes #10864
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

Closes pydata#10864 ``` | Change | Before [b5e4b0e] <main> | After [c9432cfc] <min-concat-mem> | Ratio | Benchmark (Parameter) | |----------|----------------------------|-------------------------------------|---------|---------------------------------| | - | 4.82G | 920M | 0.19 | combine.Concat1d.peakmem_concat | | - | 574±20ms | 54.0±0.6ms | 0.09 | combine.Concat1d.time_concat | ```

dcherian · 2025-10-18T16:18:12Z

with reduced sizes for CI:

| Change   | Before [b5e4b0e0] <main>   | After [beb45036] <min-concat-mem~1>   |   Ratio | Benchmark (Parameter)           |
|----------|----------------------------|---------------------------------------|---------|---------------------------------|
| -        | 935M                       | 259M                                  |    0.28 | combine.Concat1d.peakmem_concat |
| -        | 91.5±1ms                   | 6.35±0.4ms                            |    0.07 | combine.Concat1d.time_concat    |

This reverts commit f1dab89.

kmuehlbauer

@dcherian This is already very far away from my initial mediocre solution. Thanks, this will have extreme impact on our workflows. 🥇

kmuehlbauer · 2025-10-20T13:37:12Z

xarray/structure/concat.py

    file_start_indexes = np.append(0, np.cumsum(concat_dim_lengths))
-    concat_index = np.arange(file_start_indexes[-1])
-    concat_index_size = concat_index.size
+    concat_index_size = np.sum(concat_dim_lengths)


We might squeeze a bit more, if we combine the calculation of the sum with the above np.cumsum.

We might even think about adding the np.cumsum - trick you did further below and pre-allocate file_start_indexes. Not sure how much that gives, though.

file_start_indexes is only ever allocated once, so doesn't seem worth it. We can use np.cumulative_sum(concat_dim_lengths, include_initial=True) once we require numpy>=2 I believe

xarray/structure/concat.py

* main: Update docs to reflect open_mfdataset default chunk behaviour (pydata#10567)

dcherian · 2025-10-21T12:21:13Z

This is already very far away from my initial mediocre solution.

Kai, your solution solved a ~10year old bug IIRC! I should've spotted this at review. I think I assumed it scaled with number of files, ~O(10_000), instead of dimension size O(10_000_000).

kmuehlbauer · 2025-10-21T12:22:27Z

I think I assumed it scaled with number of files, ~O(10_000), instead of dimension size O(10_000_000).

At least, someone complained about it now 😀

kmuehlbauer · 2025-10-22T06:24:38Z

Thanks again @dcherian! Concatenators and combiners will have some spare time for doing more science now. 🎉

dcherian requested a review from kmuehlbauer October 18, 2025 15:59

dcherian added the topic-performance label Oct 18, 2025

dcherian added 2 commits October 18, 2025 10:16

reduce bench size

beb4503

try getting asv mamba to work

f1dab89

github-actions bot added CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Oct 18, 2025

dcherian added 3 commits October 18, 2025 10:19

Revert "try getting asv mamba to work"

731c6a1

This reverts commit f1dab89.

use conda for asv

79405b3

Use rattler instead

65ebd48

kmuehlbauer reviewed Oct 20, 2025

View reviewed changes

dcherian added 3 commits October 21, 2025 08:18

address comments

7b2cf07

Merge branch 'main' into min-concat-mem

954b4ac

* main: Update docs to reflect open_mfdataset default chunk behaviour (pydata#10567)

add whats-new

bb0f183

dcherian added the plan to merge Final call for comments label Oct 21, 2025

Merge branch 'main' into min-concat-mem

66379ad

kmuehlbauer enabled auto-merge (squash) October 22, 2025 06:03

kmuehlbauer merged commit 19f2973 into pydata:main Oct 22, 2025
35 of 36 checks passed

dcherian deleted the min-concat-mem branch October 23, 2025 20:46

dcherian mentioned this pull request Nov 14, 2025

Comprehensive benchmarking suite #4648

Open

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Minimize concat memory usage #10866

Minimize concat memory usage #10866

Uh oh!

dcherian commented Oct 18, 2025 •

edited

Loading

Uh oh!

dcherian commented Oct 18, 2025

Uh oh!

kmuehlbauer left a comment

Uh oh!

kmuehlbauer Oct 20, 2025

Uh oh!

kmuehlbauer Oct 20, 2025

Uh oh!

dcherian Oct 21, 2025

Uh oh!

Uh oh!

dcherian commented Oct 21, 2025

Uh oh!

kmuehlbauer commented Oct 21, 2025

Uh oh!

Uh oh!

kmuehlbauer commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Minimize concat memory usage #10866

Minimize concat memory usage #10866

Uh oh!

Conversation

dcherian commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcherian commented Oct 18, 2025

Uh oh!

kmuehlbauer left a comment

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

dcherian Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dcherian commented Oct 21, 2025

Uh oh!

kmuehlbauer commented Oct 21, 2025

Uh oh!

Uh oh!

kmuehlbauer commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dcherian commented Oct 18, 2025 •

edited

Loading