Quick overview#

Here are some quick examples of what you can do with xarray.DataArray objects. Everything is explained in much more detail in the rest of the documentation.

To begin, import numpy, pandas and xarray using their customary abbreviations:

import numpy as np
import pandas as pd
import xarray as xr

Create a DataArray#

You can make a DataArray from scratch by supplying data in the form of a numpy array or list, with optional dimensions and coordinates:

data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})
data

<xarray.DataArray (x: 2, y: 3)> Size: 48B
array([[-1.36328649,  0.87445112, -2.57000833],
       [-0.34724935,  0.38542157, -0.3047831 ]])
Coordinates:
  * x        (x) int64 16B 10 20
Dimensions without coordinates: y

In this case, we have generated a 2D array, assigned the names x and y to the two dimensions respectively and associated two coordinate labels ‘10’ and ‘20’ with the two locations along the x dimension. If you supply a pandas Series or DataFrame, metadata is copied directly:

xr.DataArray(pd.Series(range(3), index=list("abc"), name="foo"))

<xarray.DataArray 'foo' (dim_0: 3)> Size: 24B
array([0, 1, 2])
Coordinates:
  * dim_0    (dim_0) object 24B 'a' 'b' 'c'

Here are the key properties for a DataArray:

# like in pandas, values is a numpy array that you can modify in-place
data.values
data.dims
data.coords
# you can use this dictionary to store arbitrary metadata
data.attrs

{}

Indexing#

Xarray supports four kinds of indexing. Since we have assigned coordinate labels to the x dimension we can use label-based indexing along that dimension just like pandas. The four examples below all yield the same result (the value at x=10) but at varying levels of convenience and intuitiveness.

# positional and by integer label, like numpy
data[0, :]

# loc or "location": positional and coordinate label, like pandas
data.loc[10]

# isel or "integer select":  by dimension name and integer label
data.isel(x=0)

# sel or "select": by dimension name and coordinate label
data.sel(x=10)

<xarray.DataArray (y: 3)> Size: 24B
array([-1.36328649,  0.87445112, -2.57000833])
Coordinates:
    x        int64 8B 10
Dimensions without coordinates: y

Unlike positional indexing, label-based indexing frees us from having to know how our array is organized. All we need to know are the dimension name and the label we wish to index i.e. data.sel(x=10) works regardless of whether x is the first or second dimension of the array and regardless of whether 10 is the first or second element of x. We have already told xarray that x is the first dimension when we created data: xarray keeps track of this so we don’t have to. For more, see Indexing and selecting data.

Attributes#

While you’re setting up your DataArray, it’s often a good idea to set metadata attributes. A useful choice is to set data.attrs['long_name'] and data.attrs['units'] since xarray will use these, if present, to automatically label your plots. These special names were chosen following the NetCDF Climate and Forecast (CF) Metadata Conventions. attrs is just a Python dictionary, so you can assign anything you wish.

data.attrs["long_name"] = "random velocity"
data.attrs["units"] = "metres/sec"
data.attrs["description"] = "A random variable created as an example."
data.attrs["random_attribute"] = 123
data.attrs
# you can add metadata to coordinates too
data.x.attrs["units"] = "x units"

GroupBy#

Xarray supports grouped operations using a very similar API to pandas (see GroupBy: Group and Bin Data):

labels = xr.DataArray(["E", "F", "E"], [data.coords["y"]], name="labels")
labels
data.groupby(labels).mean("y")
data.groupby(labels).map(lambda x: x - x.min())

<xarray.DataArray (x: 2, y: 3)> Size: 48B
array([[1.20672184, 0.48902955, 0.        ],
       [2.22275898, 0.        , 2.26522523]])
Coordinates:
  * x        (x) int64 16B 10 20
Dimensions without coordinates: y
Attributes:
    long_name:         random velocity
    units:             metres/sec
    description:       A random variable created as an example.
    random_attribute:  123

Plotting#

Visualizing your datasets is quick and convenient:

data.plot()

<matplotlib.collections.QuadMesh at 0x76338b462ba0>

Note the automatic labeling with names and units. Our effort in adding metadata attributes has paid off! Many aspects of these figures are customizable: see Plotting.

pandas#

Xarray objects can be easily converted to and from pandas objects using the to_series(), to_dataframe() and to_xarray() methods:

series = data.to_series()
series

# convert back
series.to_xarray()

<xarray.DataArray (x: 2, y: 3)> Size: 48B
array([[-1.36328649,  0.87445112, -2.57000833],
       [-0.34724935,  0.38542157, -0.3047831 ]])
Coordinates:
  * x        (x) int64 16B 10 20
  * y        (y) int64 24B 0 1 2

Quick overview#

Create a DataArray#

Indexing#

Attributes#

Computation#

GroupBy#

Plotting#

pandas#

Datasets#

Read & write netCDF files#

DataTrees#