-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Calling ds.drop_encoding() on a dataset yields the following stacktrace:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
Cell In[18], line 5
2 for v in ds:
3 ds[v].encoding = {}
----> 5 ds = ds.drop_encoding()
7 # NOTE: We want to initialize the dataset with a reasonable chunking scheme.
8 #
9 # I have chosen to keep the full spatial and forecast duration in a
(...) 20 # plots? Analyze forecasts across runs?
21 # - How much S will downstream users be interested in?
22 ds.chunk(chunks={'L': 44, 'Y': 181, 'X': 360, 'M': 1, 'S': 1})
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/dataset.py:464](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/dataset.py#line=463), in Dataset.drop_encoding(self)
461 def drop_encoding(self) -> Self:
462 """Return a new Dataset without encoding on the dataset or any of its
463 variables[/coords.](https://cluster-epbmk.dask.host/coords.)"""
--> 464 variables = {k: v.drop_encoding() for k, v in self.variables.items()}
465 return self._replace(variables=variables, encoding={})
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py:929](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py#line=928), in Variable.drop_encoding(self)
927 def drop_encoding(self) -> Self:
928 """Return a new Variable without encoding."""
--> 929 return self._replace(encoding={})
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py:975](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py#line=974), in Variable._replace(self, dims, data, attrs, encoding)
973 dims = copy.copy(self._dims)
974 if data is _default:
--> 975 data = copy.copy(self.data)
976 if attrs is _default:
977 attrs = copy.copy(self._attrs)
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py:456](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py#line=455), in Variable.data(self)
454 duck_array = self._data.array
455 elif isinstance(self._data, indexing.ExplicitlyIndexed):
--> 456 duck_array = self._data.get_duck_array()
457 elif is_duck_array(self._data):
458 duck_array = self._data
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py:943](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py#line=942), in MemoryCachedArray.get_duck_array(self)
942 def get_duck_array(self):
--> 943 duck_array = self.array.get_duck_array()
944 # ensure the array object is cached in-memory
945 self.array = as_indexable(duck_array)
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py:897](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py#line=896), in CopyOnWriteArray.get_duck_array(self)
896 def get_duck_array(self):
--> 897 return self.array.get_duck_array()
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/coding/common.py:80](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/coding/common.py#line=79), in _ElementwiseFunctionArray.get_duck_array(self)
79 def get_duck_array(self):
---> 80 return self.func(self.array.get_duck_array())
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py:737](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py#line=736), in LazilyIndexedArray.get_duck_array(self)
734 from xarray.backends.common import BackendArray
736 if isinstance(self.array, BackendArray):
--> 737 array = self.array[self.key]
738 else:
739 array = apply_indexer(self.array, self.key)
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:108](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py#line=107), in NetCDF4ArrayWrapper.__getitem__(self, key)
107 def __getitem__(self, key):
--> 108 return indexing.explicit_indexing_adapter(
109 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
110 )
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py:1129](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py#line=1128), in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
1107 """Support explicit indexing by delegating to a raw indexing method.
1108
1109 Outer and[/or](https://cluster-epbmk.dask.host/or) vectorized indexers are supported by indexing a second time
(...) 1126 Indexing result, in the form of a duck numpy-array.
1127 """
1128 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
-> 1129 result = raw_indexing_method(raw_key.tuple)
1130 if numpy_indices.tuple:
1131 # index the loaded duck array
1132 indexable = as_indexable(result)
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:121](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py#line=120), in NetCDF4ArrayWrapper._getitem(self, key)
119 with self.datastore.lock:
120 original_array = self.get_array(needs_lock=False)
--> 121 array = getitem(original_array, key)
122 except IndexError as err:
123 # Catch IndexError in netCDF4 and return a more informative
124 # error message. This is most often called when an unsorted
125 # indexer is used before the data is loaded from disk.
126 msg = (
127 "The indexing operation you are attempting to perform "
128 "is not valid on netCDF4.Variable object. Try loading "
129 "your data into memory first by calling .load()."
130 )
File [/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/common.py:292](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/common.py#line=291), in robust_getitem(array, key, catch, max_retries, initial_delay)
290 for n in range(max_retries + 1):
291 try:
--> 292 return array[key]
293 except catch:
294 if n == max_retries:
File src[/netCDF4/_netCDF4.pyx:4953](https://cluster-epbmk.dask.host/netCDF4/_netCDF4.pyx#line=4952), in netCDF4._netCDF4.Variable.__getitem__()
MemoryError: Unable to allocate 505. GiB for an array with shape (11829, 4, 44, 181, 360) and data type float32
What did you expect to happen?
I expected this to be a metadata operation, requiring zero data copy ops.
I thought it would be functionally identical to this code:
for v in ds:
ds[v].encoding = {}
Minimal Complete Verifiable Example
import xarray as xr
xr.show_versions()
url = "https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubC/.NCEP/.CFSv2/.forecast/dods"
ds = xr.open_dataset(url, engine='netcdf4', decode_timedelta=True)
ds = ds.drop_encoding()Steps to reproduce
Call ds.drop_encoding() on a sufficiently large dataset.
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Anything else we need to know?
No response
Environment
Details
INSTALLED VERSIONS
commit: None
python: 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:45:31) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 6.14.0-1016-aws
machine: x86_64
processor:
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('C', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2025.9.0
pandas: 2.3.2
numpy: 2.2.6
scipy: 1.16.1
netCDF4: 1.6.5
pydap: 3.5.6
h5netcdf: 1.6.4
h5py: 3.13.0
zarr: 3.1.2
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: None
bottleneck: None
dask: 2025.7.0
distributed: 2025.7.0
matplotlib: 3.10.6
cartopy: None
seaborn: 0.13.2
numbagg: 0.9.2
fsspec: 2025.9.0
cupy: None
pint: None
sparse: None
flox: 0.10.6
numpy_groupies: 0.11.3
setuptools: 80.9.0
pip: 25.2
conda: None
pytest: None
mypy: None
IPython: 9.5.0
sphinx: None