# Working With netCDF files - xarray

* Alternative to plain netCDF4 access from python. 

* Brings the power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures:

Series and DataFrame <--> DataArray and Dataset.

The approach adopts the Common Data Model for self-describing scientific data in widespread use in the Earth sciences (e.g., netCDF and OPeNDAP): xray.Dataset is an in-memory representation of a netCDF file.

    HTML documentation: http://xray.readthedocs.org

The main advantages of using xarray versus plain netCDF4 are:

* intelligent selection along labelled dimensions (and also indexes)
* groupby operations
* data alignment
* IO (netcdf)
* conversion from and to Pandas.DataFrames


In [38]:
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap as bm

In [67]:
GETM = xr.open_dataset('../data/cefas_GETM_nwes.nc4')
GETM

<xarray.Dataset>
Dimensions:     (latc: 360, level: 5, lonc: 396, time: 6)
Coordinates:
  * latc        (latc) float64 45.4 45.45 45.5 45.55 45.6 45.65 45.7 45.75 ...
  * level       (level) float64 1.0 6.0 11.0 16.0 21.0
  * lonc        (lonc) float64 -17.5 -17.42 -17.34 -17.26 -17.18 -17.1 ...
  * time        (time) datetime64[ns] 1996-02-01T01:00:00 ...
Data variables:
    bathymetry  (latc, lonc) float64 nan nan nan nan nan nan nan nan nan nan ...
    h           (time, level, latc, lonc) float64 nan nan nan nan nan nan ...
    temp        (time, level, latc, lonc) float64 nan nan nan nan nan nan ...
Attributes:
    title: North West European Shelf 3nm
    history: Sun Nov 20 00:54:48 2016: ncks -v time,latc,lonc,level,temp,bathymetry,h -d time,0,120,24 -d level,1,25,5 /gpfs/afmcefas/share/nwes/36x36_old/timebase/nwes-3d-199602.nc cefas_GETM_nwes.nc
Mon May  9 18:50:36 2016: ncatted -a units,time,o,c,seconds since 1996-01-01 00:00:00 /gpfs/afmcefas/share/nwes/36x36_old//timebase//n

In [17]:
print(GETM.dims.keys())


[u'latc', u'level', u'lonc', u'time']


In [18]:
GETM.attrs.keys()

[u'title', u'history', u'NCO', u'nco_openmp_thread_number']

## Accessing variables

In [23]:
bathymetry = GETM['bathymetry']
temp=GETM['temp']
temp

<xarray.DataArray 'temp' (time: 6, level: 5, latc: 360, lonc: 396)>
[4276800 values with dtype=float64]
Coordinates:
  * latc     (latc) float64 45.4 45.45 45.5 45.55 45.6 45.65 45.7 45.75 45.8 ...
  * level    (level) float64 1.0 6.0 11.0 16.0 21.0
  * lonc     (lonc) float64 -17.5 -17.42 -17.34 -17.26 -17.18 -17.1 -17.02 ...
  * time     (time) datetime64[ns] 1996-02-01T01:00:00 1996-02-02T01:00:00 ...
Attributes:
    units: degC
    long_name: temperature
    valid_range: [ -2.  40.]

### Define selection

In [66]:
GETM.sel(level=1)['temp']
section=GETM.sel(lonc=50)['temp']

KeyError: 50.0

### Calculate average along a dimension

In [54]:
tave=GETM[['temp']].mean('time')

plt.contourf(tave['temp'][0,:,:],50)
plt.colorbar()
plt.show()

Dimension lookup 	Index lookup 	DataArray syntax 	Dataset syntax
Positional 	By integer 	arr[:, 0] 	not available
Positional 	By label 	arr.loc[:, 'IA'] 	not available
By name 	By integer 	arr.isel(space=0) or
arr[dict(space=0)] 	ds.isel(space=0) or
ds[dict(space=0)]
By name 	By label 	arr.sel(space='IA') or
arr.loc[dict(space='IA')] 	ds.sel(space='IA') or
ds.loc[dict(space='IA')]