Skip to content

Conversation

@shoyer
Copy link
Member

@shoyer shoyer commented Aug 12, 2025

Split out of #10624

This PR combines adds support for compute=False from DataTree.to_netcdf and to_zarr. To do so, I refactored the internals of these methods to use Xarray's lower level data store interface directly, rather than calling Dataset methods.

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

@shoyer shoyer requested a review from TomNicholas August 12, 2025 01:19
@github-actions github-actions bot added topic-backends topic-zarr Related to zarr storage library topic-DataTree Related to the implementation of a DataTree class io labels Aug 12, 2025
writer = ArrayWriter()

# TODO: figure out how to properly handle unlimited_dims
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to refactor this to a common function used in both the netCDF and the Zarr writer. Do you see a way to do that? At first glance the "validate region / encoding" bit seems to make this hard.

If there is no easy way to do that, can you please add a comment to both functions to remind future contributors to keep the logic in sync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this would be tricky.

I'm not sure a comment makes sense here -- there's no intrinsic reason why the implementations need to match, although hopefully this suggestion would be somewhat obvious? There are also unit tests, of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this separate datatree_io.py file has come to the end of it's usefulness. In a follow-up I can just merge it into the respective backends.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agreed! In the long term, we might even implement Dataset IO in terms of DataTree IO. This would let us avoid redundant code paths, similar to how we currently implement many DataArray operations in terms of Dataset.

@shoyer
Copy link
Member Author

shoyer commented Aug 16, 2025

This is ready for a final review now that tests are passing.

@shoyer
Copy link
Member Author

shoyer commented Aug 18, 2025

Just a heads up, I am going to submit this shortly so I can start iterating on follow-ups

@shoyer shoyer merged commit 89c913a into pydata:main Aug 18, 2025
37 checks passed
@shoyer shoyer deleted the to_netcdf-internals branch August 18, 2025 21:17
dcherian added a commit to dhruvak001/xarray that referenced this pull request Aug 24, 2025
* main: (46 commits)
  use the new syntax of ignoring bots (pydata#10668)
  modification methods on `Coordinates` (pydata#10318)
  Silence warnings from test_tutorial.py (pydata#10661)
  test: update write_empty test for zarr 3.1.2 (pydata#10665)
  Bump actions/checkout from 4 to 5 in the actions group (pydata#10652)
  Add load_datatree function (pydata#10649)
  Support compute=False from DataTree.to_netcdf (pydata#10625)
  Fix typos (pydata#10655)
  In case of misconfiguration of dataset.encoding `unlimited_dims` warn instead of raise (pydata#10648)
  fix ``auto_complex`` for ``open_datatree`` (pydata#10632)
  Fix bug indexing with boolean scalars (pydata#10635)
  Improve DataTree typing (pydata#10644)
  Update Cartopy and Iris references (pydata#10645)
  Empty release notes (pydata#10642)
  release notes for v2025.08.0 (pydata#10641)
  Fix `ds.merge` to prevent altering original object depending on join value (pydata#10596)
  Add asynchronous load method (pydata#10327)
  Add DataTree.prune() method              … (pydata#10598)
  Avoid refining parent dimensions in NetCDF files (pydata#10623)
  clarify lazy behaviour and eager loading chunks=None in open_*-functions (pydata#10627)
  ...
shoyer added a commit to shoyer/xarray that referenced this pull request Sep 2, 2025
Writing to zarr with `mode='r+'` was broken by pydata#10625, due to a Zarr
bug (zarr-developers/zarr-python#3428). This
add a work-around that works on older versions of Zarr.

The PR that introduced this bug has not yet appeared in an xarray
release, so there's no need for a release note.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

io topic-backends topic-DataTree Related to the implementation of a DataTree class topic-zarr Related to zarr storage library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants