-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
topic-DataTreeRelated to the implementation of a DataTree classRelated to the implementation of a DataTree classtopic-backendstopic-performance
Description
What is your issue?
The implementation of open_datatree works, but is inefficient, because it calls open_dataset once for every group in the file. We should refactor this to improve the performance, which would fix issues like xarray-contrib/datatree#330.
We discussed this in the datatree meeting, and my understanding is that concretely we need to:
- Create an asv benchmark for
open_datatree, probably involving first writing then benchmarking the opening of a special netCDF file that has no data but lots of groups. (tracked in Add benchmark test for open_datatree #9100) - Refactor the
NetCDFDatastoreclass to only create oneCachingFileManagerobject per file, not one per group, see.xarray/xarray/backends/netCDF4_.py
Line 406 in 748bb3a
manager = CachingFileManager( - Refactor
NetCDF4BackendEntrypoint.open_datatreeto use an implementation that goes throughNetCDFDatastorewithout calling the top-levelxr.open_datasetagain. - Check the performance of calling
xr.open_datatreeon a netCDF file has actually improved.
It would be great to get this done soon as part of the datatree integration project. @kmuehlbauer I know you were interested - are you willing / do you have time to take this task on?
Metadata
Metadata
Assignees
Labels
topic-DataTreeRelated to the implementation of a DataTree classRelated to the implementation of a DataTree classtopic-backendstopic-performance
Type
Projects
Status
Done