Xarray Indexes Gallery#
Background#
Xarray’s data model was initially heavily inspired from the NetCDF file format, making it well suited for working with n-dimensional, rectilinear gridded datasets commonly found in scientific data analysis, especially in the geosciences. In fact, Xarray has used many versions of the schematic below to convey a “canonical” data structure that are ubiquitous in geosciences (3D datasets with coordinates that are either 2D or 1D).
An illustration of the traditional Xarray data model.#
Over the years, Xarray has evolved and has been adopted in an increasing number of domains as a convenient, general-purpose Python library for handling n-dimensional labeled arrays. Xarray’s data structures are now being used for representing a wide range of datasets including sparse data, curvilinear or irregular grids, staggered grids, discrete global grids, image stacks and vector data cubes. Consequently, we’ll here expand our minds to consider data structures that are much more versatile 🤯.
A better illustration of the variety of Xarray datasets in the wild.#
What is an Xarray index?#
In order to analyze these increasingly complex data structures in Xarray, we require a flexible indexing system.
What is an index?
This is a common concept in database systems and data-frame libraries. Generally speaking:
An index is a data structure that permits fast data lookup and retrieval.
For example, a pandas.Index object can be used to efficiently select
rows of a pandas.DataFrame by one or more labels. Two different
DataFrame objects may be combined together thanks to their index.
What about Xarray?
Until recently Xarray exclusively relied on pandas.Index to allow
fast label-based selection and alignment of n-dimensional data via the concept
of “dimension” coordinates. This approach
worked very well for rectilinear gridded datasets but quickly reached its limits
when considering other kinds of datasets.
While Xarray still follows the same approach as its default behavior, it has
also become much more flexible: an xarray.Dataset or
xarray.DataArray may now have one or more custom
xarray.Index objects each associated with their own coordinates of
arbitrary dimension(s). Goodbye “dimension” coordinate vs. “non-dimension” coordinate and welcome “index” coordinate
vs. “non-index” coordinate!
What is an Xarray index?
xarray.Index serves a broader purpose than a database index. It
provides an API that allows dealing with coordinate data and metadata in a
highly customizable way for the most common Xarray operations such as isel,
sel, align, concat, stack… Xarray indexes usually hold, track and
propagate additional information wrapped in arbitrary Python objects, along with
coordinate labels and attributes. In many cases the propagation of information
via custom indexes is much more efficient and/or reliable than via coordinate
labels and attributes. xarray.Index thus represents a powerful
extension mechanism that nicely complements
accessors
and IO
backends.
What is this website?#
Xarray flexible indexes unlock a lot of possibilities. We hope that this gallery of Xarray built-in and 3rd-party indexes will show a good illustration of the potential of this feature and will serve as a good reference for implementing custom indexes (or simply find the existing ones that fulfill your needs).
Contribution#
Your additions to this gallery are very welcome, particularly for fields outside the Earth Sciences! Please open a pull request on our GitHub repository