Support compute=False from DataTree.to_netcdf #10625

shoyer · 2025-08-12T01:19:41Z

Split out of #10624

This PR combines adds support for compute=False from DataTree.to_netcdf and to_zarr. To do so, I refactored the internals of these methods to use Xarray's lower level data store interface directly, rather than calling Dataset methods.

Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

xarray/tests/test_backends_datatree.py

xarray/core/datatree_io.py

dcherian · 2025-08-13T01:13:50Z

xarray/core/datatree_io.py

+    writer = ArrayWriter()
+
+    # TODO: figure out how to properly handle unlimited_dims
+    try:


It'd be nice to refactor this to a common function used in both the netCDF and the Zarr writer. Do you see a way to do that? At first glance the "validate region / encoding" bit seems to make this hard.

If there is no easy way to do that, can you please add a comment to both functions to remind future contributors to keep the logic in sync?

Yeah, I think this would be tricky.

I'm not sure a comment makes sense here -- there's no intrinsic reason why the implementations need to match, although hopefully this suggestion would be somewhat obvious? There are also unit tests, of course.

TomNicholas · 2025-08-13T23:20:35Z

xarray/core/datatree_io.py

I think this separate datatree_io.py file has come to the end of it's usefulness. In a follow-up I can just merge it into the respective backends.

Definitely agreed! In the long term, we might even implement Dataset IO in terms of DataTree IO. This would let us avoid redundant code paths, similar to how we currently implement many DataArray operations in terms of Dataset.

shoyer · 2025-08-16T01:10:52Z

This is ready for a final review now that tests are passing.

shoyer · 2025-08-18T17:16:18Z

Just a heads up, I am going to submit this shortly so I can start iterating on follow-ups

* main: (46 commits) use the new syntax of ignoring bots (pydata#10668) modification methods on `Coordinates` (pydata#10318) Silence warnings from test_tutorial.py (pydata#10661) test: update write_empty test for zarr 3.1.2 (pydata#10665) Bump actions/checkout from 4 to 5 in the actions group (pydata#10652) Add load_datatree function (pydata#10649) Support compute=False from DataTree.to_netcdf (pydata#10625) Fix typos (pydata#10655) In case of misconfiguration of dataset.encoding `unlimited_dims` warn instead of raise (pydata#10648) fix ``auto_complex`` for ``open_datatree`` (pydata#10632) Fix bug indexing with boolean scalars (pydata#10635) Improve DataTree typing (pydata#10644) Update Cartopy and Iris references (pydata#10645) Empty release notes (pydata#10642) release notes for v2025.08.0 (pydata#10641) Fix `ds.merge` to prevent altering original object depending on join value (pydata#10596) Add asynchronous load method (pydata#10327) Add DataTree.prune() method … (pydata#10598) Avoid refining parent dimensions in NetCDF files (pydata#10623) clarify lazy behaviour and eager loading chunks=None in open_*-functions (pydata#10627) ...

Writing to zarr with `mode='r+'` was broken by pydata#10625, due to a Zarr bug (zarr-developers/zarr-python#3428). This add a work-around that works on older versions of Zarr. The PR that introduced this bug has not yet appeared in an xarray release, so there's no need for a release note.

Refactor to_netcdf() and to_zarr() internals

cce5477

shoyer requested a review from TomNicholas August 12, 2025 01:19

github-actions bot added topic-backends topic-zarr Related to zarr storage library topic-DataTree Related to the implementation of a DataTree class io labels Aug 12, 2025

Merge branch 'main' into to_netcdf-internals

a3689d4

kmuehlbauer mentioned this pull request Aug 12, 2025

Sanitize unlimited_dims when writing to_netcdf #10608

Merged

3 tasks

Merge branch 'main' into to_netcdf-internals

f5ba356

shoyer mentioned this pull request Aug 12, 2025

DataTree.to_zarr() is very slow writing to high latency store #9455

Open

dcherian reviewed Aug 13, 2025

View reviewed changes

xarray/tests/test_backends_datatree.py Show resolved Hide resolved

dcherian reviewed Aug 13, 2025

View reviewed changes

xarray/core/datatree_io.py Outdated Show resolved Hide resolved

dcherian reviewed Aug 13, 2025

View reviewed changes

shoyer mentioned this pull request Aug 13, 2025

DataTree.to_zarr() performs redundant computations with cross-group dependencies #10637

Closed

5 tasks

shoyer added 3 commits August 13, 2025 13:59

Merge branch 'main' into to_netcdf-internals

c4d57f1

Fixes per review

22d3387

Clean up comments

e68e186

TomNicholas reviewed Aug 13, 2025

View reviewed changes

shoyer added 5 commits August 14, 2025 21:30

Merge branch 'main' into to_netcdf-internals

b925465

Fix type for to_netcdf()

6d8ae1e

Add test and whats-new for cross-group redundant computation

d9da973

Fix test failure on CI (and add a better test)

205fdbe

grammar

e82c334

TomNicholas approved these changes Aug 18, 2025

View reviewed changes

shoyer merged commit 89c913a into pydata:main Aug 18, 2025
37 checks passed

shoyer deleted the to_netcdf-internals branch August 18, 2025 21:17

shoyer mentioned this pull request Sep 2, 2025

Fix unreleased bug in DataTree.to_zarr() with mode='r+' #10685

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support compute=False from DataTree.to_netcdf #10625

Support compute=False from DataTree.to_netcdf #10625

Uh oh!

shoyer commented Aug 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

dcherian Aug 13, 2025

Uh oh!

shoyer Aug 13, 2025

Uh oh!

TomNicholas Aug 13, 2025

Uh oh!

shoyer Aug 14, 2025

Uh oh!

shoyer commented Aug 16, 2025

Uh oh!

shoyer commented Aug 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Support compute=False from DataTree.to_netcdf #10625

Support compute=False from DataTree.to_netcdf #10625

Uh oh!

Conversation

shoyer commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dcherian Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

shoyer Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

TomNicholas Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

shoyer Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

shoyer commented Aug 16, 2025

Uh oh!

shoyer commented Aug 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shoyer commented Aug 12, 2025 •

edited

Loading