Skip to content

Dropping encoding attempts to load the full dataset into memory #10930

@monodeldiablo

Description

@monodeldiablo

What happened?

Calling ds.drop_encoding() on a dataset yields the following stacktrace:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
Cell In[18], line 5
      2 for v in ds:
      3     ds[v].encoding = {}
----> 5 ds = ds.drop_encoding()
      7 # NOTE: We want to initialize the dataset with a reasonable chunking scheme.
      8 #
      9 #       I have chosen to keep the full spatial and forecast duration in a
   (...)     20 #         plots? Analyze forecasts across runs?
     21 #       - How much S will downstream users be interested in?
     22 ds.chunk(chunks={'L': 44, 'Y': 181, 'X': 360, 'M': 1, 'S': 1})

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/dataset.py:464](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/dataset.py#line=463), in Dataset.drop_encoding(self)
    461 def drop_encoding(self) -> Self:
    462     """Return a new Dataset without encoding on the dataset or any of its
    463     variables[/coords.](https://cluster-epbmk.dask.host/coords.)"""
--> 464     variables = {k: v.drop_encoding() for k, v in self.variables.items()}
    465     return self._replace(variables=variables, encoding={})

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py:929](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py#line=928), in Variable.drop_encoding(self)
    927 def drop_encoding(self) -> Self:
    928     """Return a new Variable without encoding."""
--> 929     return self._replace(encoding={})

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py:975](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py#line=974), in Variable._replace(self, dims, data, attrs, encoding)
    973     dims = copy.copy(self._dims)
    974 if data is _default:
--> 975     data = copy.copy(self.data)
    976 if attrs is _default:
    977     attrs = copy.copy(self._attrs)

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py:456](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/variable.py#line=455), in Variable.data(self)
    454     duck_array = self._data.array
    455 elif isinstance(self._data, indexing.ExplicitlyIndexed):
--> 456     duck_array = self._data.get_duck_array()
    457 elif is_duck_array(self._data):
    458     duck_array = self._data

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py:943](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py#line=942), in MemoryCachedArray.get_duck_array(self)
    942 def get_duck_array(self):
--> 943     duck_array = self.array.get_duck_array()
    944     # ensure the array object is cached in-memory
    945     self.array = as_indexable(duck_array)

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py:897](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py#line=896), in CopyOnWriteArray.get_duck_array(self)
    896 def get_duck_array(self):
--> 897     return self.array.get_duck_array()

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/coding/common.py:80](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/coding/common.py#line=79), in _ElementwiseFunctionArray.get_duck_array(self)
     79 def get_duck_array(self):
---> 80     return self.func(self.array.get_duck_array())

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py:737](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py#line=736), in LazilyIndexedArray.get_duck_array(self)
    734 from xarray.backends.common import BackendArray
    736 if isinstance(self.array, BackendArray):
--> 737     array = self.array[self.key]
    738 else:
    739     array = apply_indexer(self.array, self.key)

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:108](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py#line=107), in NetCDF4ArrayWrapper.__getitem__(self, key)
    107 def __getitem__(self, key):
--> 108     return indexing.explicit_indexing_adapter(
    109         key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
    110     )

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py:1129](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/core/indexing.py#line=1128), in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
   1107 """Support explicit indexing by delegating to a raw indexing method.
   1108 
   1109 Outer and[/or](https://cluster-epbmk.dask.host/or) vectorized indexers are supported by indexing a second time
   (...)   1126 Indexing result, in the form of a duck numpy-array.
   1127 """
   1128 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
-> 1129 result = raw_indexing_method(raw_key.tuple)
   1130 if numpy_indices.tuple:
   1131     # index the loaded duck array
   1132     indexable = as_indexable(result)

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:121](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py#line=120), in NetCDF4ArrayWrapper._getitem(self, key)
    119     with self.datastore.lock:
    120         original_array = self.get_array(needs_lock=False)
--> 121         array = getitem(original_array, key)
    122 except IndexError as err:
    123     # Catch IndexError in netCDF4 and return a more informative
    124     # error message.  This is most often called when an unsorted
    125     # indexer is used before the data is loaded from disk.
    126     msg = (
    127         "The indexing operation you are attempting to perform "
    128         "is not valid on netCDF4.Variable object. Try loading "
    129         "your data into memory first by calling .load()."
    130     )

File [/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/common.py:292](https://cluster-epbmk.dask.host/opt/coiled/env/lib/python3.12/site-packages/xarray/backends/common.py#line=291), in robust_getitem(array, key, catch, max_retries, initial_delay)
    290 for n in range(max_retries + 1):
    291     try:
--> 292         return array[key]
    293     except catch:
    294         if n == max_retries:

File src[/netCDF4/_netCDF4.pyx:4953](https://cluster-epbmk.dask.host/netCDF4/_netCDF4.pyx#line=4952), in netCDF4._netCDF4.Variable.__getitem__()

MemoryError: Unable to allocate 505. GiB for an array with shape (11829, 4, 44, 181, 360) and data type float32

What did you expect to happen?

I expected this to be a metadata operation, requiring zero data copy ops.

I thought it would be functionally identical to this code:

for v in ds:
    ds[v].encoding = {}

Minimal Complete Verifiable Example

import xarray as xr
xr.show_versions()

url = "https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubC/.NCEP/.CFSv2/.forecast/dods"
ds = xr.open_dataset(url, engine='netcdf4', decode_timedelta=True)
ds = ds.drop_encoding()

Steps to reproduce

Call ds.drop_encoding() on a sufficiently large dataset.

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

No response

Environment

Details

INSTALLED VERSIONS

commit: None
python: 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:45:31) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 6.14.0-1016-aws
machine: x86_64
processor:
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('C', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2025.9.0
pandas: 2.3.2
numpy: 2.2.6
scipy: 1.16.1
netCDF4: 1.6.5
pydap: 3.5.6
h5netcdf: 1.6.4
h5py: 3.13.0
zarr: 3.1.2
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: None
bottleneck: None
dask: 2025.7.0
distributed: 2025.7.0
matplotlib: 3.10.6
cartopy: None
seaborn: 0.13.2
numbagg: 0.9.2
fsspec: 2025.9.0
cupy: None
pint: None
sparse: None
flox: 0.10.6
numpy_groupies: 0.11.3
setuptools: 80.9.0
pip: 25.2
conda: None
pytest: None
mypy: None
IPython: 9.5.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions