## [官方文档](http://xarray.pydata.org/en/stable/index.html)

In [1]:
import numpy as np
import pandas as pd
import xarray as xr

# 创建 DataArray
可以通过以 `numpy` 数组或列表的形式提供数据 (具有可选的维度和坐标) 来从头开始创建 DataArray:

In [2]:
xr.DataArray(np.random.randn(2, 3))

<xarray.DataArray (dim_0: 2, dim_1: 3)>
array([[-1.638625,  1.579078,  1.363426],
       [-0.206467, -0.023546, -0.48682 ]])
Dimensions without coordinates: dim_0, dim_1

In [3]:
data = xr.DataArray(
    np.random.randn(2, 3), coords={'x': ['a', 'b']}, dims=('x', 'y'))

data

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.499477,  1.428605,  1.395588],
       [-0.409997, -0.446366,  0.151565]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

`pd.Series`或 `pd.DataFrame`（If you supply a pandas or , metadata is copied directly:）

In [4]:
xr.DataArray(pd.Series(range(3), index=list('abc'), name='foo'))

<xarray.DataArray 'foo' (dim_0: 3)>
array([0, 1, 2], dtype=int64)
Coordinates:
  * dim_0    (dim_0) object 'a' 'b' 'c'

# `DataArray` 的属性

In [5]:
data.values

array([[ 0.49947696,  1.42860511,  1.39558837],
       [-0.4099968 , -0.44636605,  0.15156504]])

In [6]:
data.dims

('x', 'y')

In [7]:
data.coords

Coordinates:
  * x        (x) <U1 'a' 'b'

In [9]:
data.attrs

OrderedDict()

# 索引

`xarray`支持四种索引。这些操作与 Pandas 一样快，因为我们借用了熊猫的索引机制。

In [10]:
# positional and by integer label, like numpy
data[[0, 1]]

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.499477,  1.428605,  1.395588],
       [-0.409997, -0.446366,  0.151565]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

In [11]:
# positional and by coordinate label, like pandas
data.loc['a':'b']

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.499477,  1.428605,  1.395588],
       [-0.409997, -0.446366,  0.151565]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

In [12]:
# by dimension name and integer label
data.isel(x=slice(2))

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.499477,  1.428605,  1.395588],
       [-0.409997, -0.446366,  0.151565]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

In [13]:
# by dimension name and coordinate label
data.sel(x=['a', 'b'])

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.499477,  1.428605,  1.395588],
       [-0.409997, -0.446366,  0.151565]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

# 计算
Data arrays work very similarly to numpy ndarrays:

In [14]:
data + 10

<xarray.DataArray (x: 2, y: 3)>
array([[ 10.499477,  11.428605,  11.395588],
       [  9.590003,   9.553634,  10.151565]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

In [15]:
np.sin(data)

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.478966,  0.989908,  0.98469 ],
       [-0.398606, -0.43169 ,  0.150985]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

In [16]:
data.T

<xarray.DataArray (y: 3, x: 2)>
array([[ 0.499477, -0.409997],
       [ 1.428605, -0.446366],
       [ 1.395588,  0.151565]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

In [17]:
data.sum()

<xarray.DataArray ()>
array(2.6188726204965733)

但是，聚合操作（aggregation operations）可以使用维度名称而不是轴号码：

In [18]:
data.mean(dim='x')

<xarray.DataArray (y: 3)>
array([ 0.04474 ,  0.49112 ,  0.773577])
Dimensions without coordinates: y

算术运算基于维度名称进行广播。这意味着您不需要插入虚拟尺寸进行对齐：

In [19]:
a = xr.DataArray(np.random.randn(3), [data.coords['y']])

In [20]:
b = xr.DataArray(np.random.randn(4), dims='z')

In [21]:
a

<xarray.DataArray (y: 3)>
array([-0.280933,  0.257208,  0.082594])
Coordinates:
  * y        (y) int64 0 1 2

In [22]:
b

<xarray.DataArray (z: 4)>
array([ 1.636682,  1.759435, -2.114956, -0.062329])
Dimensions without coordinates: z

In [23]:
a + b

<xarray.DataArray (y: 3, z: 4)>
array([[ 1.355748,  1.478502, -2.395889, -0.343263],
       [ 1.893889,  2.016643, -1.857748,  0.194879],
       [ 1.719276,  1.842029, -2.032361,  0.020265]])
Coordinates:
  * y        (y) int64 0 1 2
Dimensions without coordinates: z

这也意味着在大多数情况下，您无需担心维度的顺序

In [24]:
data - data.T

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

操作也基于索引标签（index labels ）对齐：

In [25]:
data[:-1] - data[:1]

<xarray.DataArray (x: 1, y: 3)>
array([[ 0.,  0.,  0.]])
Coordinates:
  * x        (x) <U1 'a'
Dimensions without coordinates: y

# `GroupBy`
xarray supports grouped operations using a very similar API to pandas

In [26]:
labels = xr.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels')

In [27]:
labels

<xarray.DataArray 'labels' (y: 3)>
array(['E', 'F', 'E'],
      dtype='<U1')
Coordinates:
  * y        (y) int64 0 1 2

In [28]:
data.groupby(labels).mean('y')

<xarray.DataArray (x: 2, labels: 2)>
array([[ 0.947533,  1.428605],
       [-0.129216, -0.446366]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * labels   (labels) object 'E' 'F'

In [29]:
data.groupby(labels).apply(lambda x: x - x.min())

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.909474,  1.874971,  1.805585],
       [ 0.      ,  0.      ,  0.561562]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 0 1 2
    labels   (y) <U1 'E' 'F' 'E'

# 与  `pandas` 之间的转换

In [30]:
series = data.to_series()
series

x  y
a  0    0.499477
   1    1.428605
   2    1.395588
b  0   -0.409997
   1   -0.446366
   2    0.151565
dtype: float64

## convert back

In [31]:
series.to_xarray()

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.499477,  1.428605,  1.395588],
       [-0.409997, -0.446366,  0.151565]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 0 1 2

# `Datasets`
`xarray.Dataset` is a dict-like container of aligned DataArray objects. You can think of it as a multi-dimensional generalization of the pandas.DataFrame:

数据集 `xarray.Dataset` 是一个对齐的 DataArray 对象的字典型容器。您可以将其视为`pandas.DataFrame`的多维泛化：

In [32]:
ds = xr.Dataset({'foo': data, 'bar': ('x', [1, 2]), 'baz': np.pi})
ds

<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 0.4995 1.429 1.396 -0.41 -0.4464 0.1516
    bar      (x) int32 1 2
    baz      float64 3.142

In [34]:
ds.to_dict()

{'coords': {'x': {'data': ['a', 'b'], 'dims': ('x',), 'attrs': {}}},
 'attrs': {},
 'dims': {'x': 2, 'y': 3},
 'data_vars': {'foo': {'data': [[0.49947696171981487,
     1.4286051113848126,
     1.3955883684242683],
    [-0.4099968026900433, -0.4463660535228815, 0.151565035180602]],
   'dims': ('x', 'y'),
   'attrs': {}},
  'bar': {'data': [1, 2], 'dims': ('x',), 'attrs': {}},
  'baz': {'data': 3.141592653589793, 'dims': (), 'attrs': {}}}}

## 使用字典索引将数据集变量提取为 DataArray 对象

In [35]:
ds['foo']

<xarray.DataArray 'foo' (x: 2, y: 3)>
array([[ 0.499477,  1.428605,  1.395588],
       [-0.409997, -0.446366,  0.151565]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

数据集中的变量可以具有不同的`dtype`甚至不同的维度，但是所有维度都假定是指同一共享坐标系中的点。

# `NetCDF`

NetCDF is the recommended binary serialization（序列化） format for xarray objects. Users from the geosciences will recognize that the `Dataset` data model looks very similar to a netCDF file (which, in fact, inspired it).

您可以使用和直接读取和写入`xarray`对象到磁盘`to_netcdf()`，`open_dataset()`和`open_dataarray()`：

In [36]:
ds.to_netcdf('example.nc')

In [37]:
xr.open_dataset('example.nc')

<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * x        (x) object 'a' 'b'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 ...
    bar      (x) int32 ...
    baz      float64 ...