# Notes - Xarray

## Terminology

- DataArray: A multi-dimensional array with labeled or named dimensions. DataArray objects add metadata such as dimension names, coordinates, and attributes. For example, an array is var(time, level, lat, lon).
- DataSet: A dict-like collection of DataArray objects with aligned dimensions. For example, a dataset contains temperature(time, level, lat, lon) and precipitation(time, lat, lon).

## References

- Unidata Xarray Introduction, https://unidata.github.io/python-training/workshop/XArray/xarray-introduction/
- Xarray quick overview, https://docs.xarray.dev/en/stable/getting-started-guide/quick-overview.html
- Xarray computation, https://docs.xarray.dev/en/stable/user-guide/computation.html


In [1]:
import numpy as np
import xarray as xr
import io, os, sys, types
import yhc_module as yhc

## Create a DataArray

Xarray - https://docs.xarray.dev/en/stable/getting-started-guide/quick-overview.html#create-a-dataarray

Unidata - https://unidata.github.io/python-training/workshop/XArray/xarray-introduction/#DataArray

In [2]:
#--- Create some sample "temperature" data
data = 283 + 5 * np.random.randn(5, 3, 4)

time = np.arange(0,5)
lat = np.linspace(-120., 60., 3)
lon = np.linspace(25.,55.,4)

#--- create a DataArray & set attributes
temp = xr.DataArray(data, dims=['time', 'lat', 'lon'], coords=[time, lat, lon])

temp.attrs['units'] = "K"
temp.attrs['long_name'] = "Temperature"

with xr.set_options(keep_attrs=True):  # keep attributes after operation
  temp_degC = temp - 273.15
temp_degC.attrs['units'] = "C"
temp_degC

#--- put all in one line
arr = xr.DataArray(np.random.RandomState(0).randn(2, 3), [("x", ["a", "b"]), ("y", [10, 20, 30])])
arr

## Selection

Unidata - https://unidata.github.io/python-training/workshop/XArray/xarray-introduction/#Selection

### Selection Method 1: use indexing

In [3]:
#--- Method 1: use indexing
var = temp[0, 1:2, :]
var

### Selection Method 2: Use name dimension & slicing

In [4]:
#--- Method 2; use name dimension
temp.coords  # check out variable dimension
print(temp.coords)

#--- select specific values in coordinates
var = temp.sel(time=1, lat=-30., lon=25)
print('------------')
print(var)

var = temp.sel(time=1, lon=25)
print('------------')
print(var)

#--- Slicing with Selection
var = temp.sel(time=slice(0,2), lat=-30., lon=slice(-1000.,1000.))
print('------------')
print(var)

Coordinates:
  * time     (time) int64 0 1 2 3 4
  * lat      (lat) float64 -120.0 -30.0 60.0
  * lon      (lon) float64 25.0 35.0 45.0 55.0
------------
<xarray.DataArray ()>
array(282.30234947)
Coordinates:
    time     int64 1
    lat      float64 -30.0
    lon      float64 25.0
Attributes:
    units:      K
    long_name:  Temperature
------------
<xarray.DataArray (lat: 3)>
array([283.12922851, 282.30234947, 285.97236531])
Coordinates:
    time     int64 1
  * lat      (lat) float64 -120.0 -30.0 60.0
    lon      float64 25.0
Attributes:
    units:      K
    long_name:  Temperature
------------
<xarray.DataArray (time: 3, lon: 4)>
array([[282.63051074, 289.41457191, 281.17764545, 286.93160183],
       [282.30234947, 285.70192815, 270.1234263 , 285.88475911],
       [273.29764188, 275.43518926, 285.54716922, 286.84422006]])
Coordinates:
  * time     (time) int64 0 1 2
    lat      float64 -30.0
  * lon      (lon) float64 25.0 35.0 45.0 55.0
Attributes:
    units:      K
    long_n

### Selection Method 3: use .loc

In [5]:
#*** Useful if already knowing the range to each coordinate

# temp is temp(time, lat, lon)
var = temp.loc[0:4, -120:30, :]
print(var)

<xarray.DataArray (time: 5, lat: 2, lon: 4)>
array([[[281.49560645, 282.85253842, 285.59470479, 277.21813573],
        [282.63051074, 289.41457191, 281.17764545, 286.93160183]],

       [[283.12922851, 273.90833491, 289.41210319, 290.29549097],
        [282.30234947, 285.70192815, 270.1234263 , 285.88475911]],

       [[287.02102403, 278.72605071, 287.81945257, 284.57465355],
        [273.29764188, 275.43518926, 285.54716922, 286.84422006]],

       [[291.0884297 , 279.67420666, 285.20788471, 278.51981325],
        [280.22120182, 280.69601914, 289.2153916 , 280.33967512]],

       [[287.10607979, 280.19488944, 282.58437143, 281.10215596],
        [287.76960016, 278.90237705, 276.49566771, 286.88507031]]])
Coordinates:
  * time     (time) int64 0 1 2 3 4
  * lat      (lat) float64 -120.0 -30.0
  * lon      (lon) float64 25.0 35.0 45.0 55.0
Attributes:
    units:      K
    long_name:  Temperature


### Selection Method 4: where() to conditionally switch between values

In [6]:
#--- use where()
var = temp.where(temp > 280.)  # if temp < 280., it would become nan
var

#--- create a mask, but fail.
#xr.where(temp > 280., "positive", "negative")

## Computation &

### Basic numerical operation

In [7]:
#--- arithmetic operation
xr.set_options(keep_attrs =True)  # keep attributes after operations
temp_degC = temp - 273.15
temp_degC.attrs['units']="C"
temp_degC

#--- mean and others
#    a list: mean, min, max, std, sum, weighted
temp_tavg = temp.mean("time")
temp_tavg

temp_tiavg = temp.mean(["time","lon"])
temp_tiavg

### Rolling average

https://xarray.pydata.org/en/stable/user-guide/computation.html#rolling-window-operations
https://xarray.pydata.org/en/stable/generated/xarray.DataArray.rolling.html#xarray.DataArray.rolling

In [17]:
  #--- pk values. Output directly from AM4 files
  pk_list = [100, 400, 818.6021, 1378.886, 2091.795, 2983.641, 4121.79, 5579.222, 
    6907.19, 7735.787, 8197.665, 8377.955, 8331.696, 8094.722, 7690.857, 
    7139.018, 6464.803, 5712.357, 4940.054, 4198.604, 3516.633, 2905.199, 
    2366.737, 1899.195, 1497.781, 1156.253, 867.792, 625.5933, 426.2132, 
    264.7661, 145.0665, 60, 15, 0]

  #--- bk values. Output directly from AM4 files
  bk_list = [0, 0, 0, 0, 0, 0, 0, 0, 0.00513, 0.01969, 0.04299, 0.07477, 0.11508, 
    0.16408, 0.22198, 0.28865, 0.36281, 0.44112, 0.51882, 0.59185, 0.6581, 
    0.71694, 0.76843, 0.81293, 0.851, 0.88331, 0.91055, 0.93331, 0.95214, 
    0.9675, 0.97968, 0.98908, 0.99575, 1]  

  #--- make DataArray
  ps = xr.DataArray([102000.], dims=['time'])
  pk = xr.DataArray(pk_list, dims=['plev'])
  bk = xr.DataArray(bk_list, dims=['plev'])

  #--- p at half levels: p = ps*bk + pk. ps is the first term because it would be broadcasted.
  phalf = ps*bk + pk

  #--- compute p at full levels, using .rolling(). NaNs is removed by dropna()
  pfull = phalf.rolling(plev=2, center=True).mean().dropna("plev")
  yhc.printv(phalf, "phalf")
  yhc.printv(pfull, "pfull")

--------------
phalf
<xarray.DataArray (time: 1, plev: 34)>
array([[1.00000000e+02, 4.00000000e+02, 8.18602100e+02, 1.37888600e+03,
        2.09179500e+03, 2.98364100e+03, 4.12179000e+03, 5.57922200e+03,
        7.43045000e+03, 9.74416700e+03, 1.25826450e+04, 1.60044950e+04,
        2.00698560e+04, 2.48308820e+04, 3.03328170e+04, 3.65813180e+04,
        4.34714230e+04, 5.07065970e+04, 5.78596940e+04, 6.45673040e+04,
        7.06428330e+04, 7.60330790e+04, 8.07465970e+04, 8.48180550e+04,
        8.82997810e+04, 9.12538730e+04, 9.37438920e+04, 9.58232133e+04,
        9.75444932e+04, 9.89497661e+04, 1.00072427e+05, 1.00946160e+05,
        1.01581500e+05, 1.02000000e+05]])
Dimensions without coordinates: time, plev

--------------
pfull
<xarray.DataArray (time: 1, plev: 33)>
array([[   250.     ,    609.30105,   1098.74405,   1735.3405 ,
          2537.718  ,   3552.7155 ,   4850.506  ,   6504.836  ,
          8587.3085 ,  11163.406  ,  14293.57   ,  18037.1755 ,
         22450.369  ,  275

### Broadcasting by dimension name

https://docs.xarray.dev/en/stable/user-guide/computation.html#broadcasting-by-dimension-name

In [8]:
a = xr.DataArray([1, 2], [("x", ["a", "b"])])
b = xr.DataArray([-1, -2, -3], [("y", [10, 20, 30])])
c = xr.DataArray(np.arange(6).reshape(3, 2), [b["y"], a["x"]])

#print(a)
#print(b)
#print(c)

#--- With xarray, and their dimensions are expanded automatically:
print(a*b)  # 2x3 array
print(b*a)  # 3x2 array

#--- explicitly broadcast xarray data structures by using the broadcast()
a2, b2 = xr.broadcast(a, b)
print(a2)
print(b2)


<xarray.DataArray (x: 2, y: 3)>
array([[-1, -2, -3],
       [-2, -4, -6]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 10 20 30
<xarray.DataArray (y: 3, x: 2)>
array([[-1, -2],
       [-2, -4],
       [-3, -6]])
Coordinates:
  * y        (y) int64 10 20 30
  * x        (x) <U1 'a' 'b'
<xarray.DataArray (x: 2, y: 3)>
array([[1, 1, 1],
       [2, 2, 2]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 10 20 30
<xarray.DataArray (x: 2, y: 3)>
array([[-1, -2, -3],
       [-1, -2, -3]])
Coordinates:
  * y        (y) int64 10 20 30
  * x        (x) <U1 'a' 'b'


### Automatic alignment
https://docs.xarray.dev/en/stable/user-guide/computation.html#automatic-alignment



In [9]:
arr = xr.DataArray(np.arange(3), [("x", range(3))])

#--- only operate available elements
print(arr)
print(arr[:-1])
print(arr + arr[:-1])

<xarray.DataArray (x: 3)>
array([0, 1, 2])
Coordinates:
  * x        (x) int64 0 1 2
<xarray.DataArray (x: 2)>
array([0, 1])
Coordinates:
  * x        (x) int64 0 1
<xarray.DataArray (x: 2)>
array([0, 2])
Coordinates:
  * x        (x) int64 0 1


## Plotting