# The netCDF file format

* netCDF is a collection of formats for storing arrays
netCDF, netCDF4, HDF5
* Developed by Unidata-UCAR
* auxilary information about each variable can be added
* readable text equivalent called CDL (use ncdump/ncgen)

## Data model:

* Variables: N-dimensional arrays of data.
* Dimensions:
    describe the axes of the data arrays.
* Attributes:
    annotate variables or files with small notes or supplementary metadata.

# Working With netCDF files using xarray

* Alternative to plain netCDF4 access from python. 

* Brings the power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures:

Series and DataFrame <--> DataArray and Dataset.

    HTML documentation: http://xray.readthedocs.org


### The main advantages of using xarray versus plain netCDF4 are:

* intelligent selection along labelled dimensions (and also indexes)
* groupby operations
* data alignment
* IO (netcdf)
* conversion from and to Pandas.DataFrames

In [5]:
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap as bm

In [30]:
GETM = xr.open_dataset('../data/cefas_GETM_nwes.nc4')
GETM

<xarray.Dataset>
Dimensions:     (latc: 360, level: 5, lonc: 396, time: 6)
Coordinates:
  * latc        (latc) float64 45.4 45.45 45.5 45.55 45.6 45.65 45.7 45.75 ...
  * level       (level) float64 1.0 6.0 11.0 16.0 21.0
  * lonc        (lonc) float64 -17.5 -17.42 -17.34 -17.26 -17.18 -17.1 ...
  * time        (time) datetime64[ns] 1996-02-01T01:00:00 ...
Data variables:
    bathymetry  (latc, lonc) float64 nan nan nan nan nan nan nan nan nan nan ...
    h           (time, level, latc, lonc) float64 nan nan nan nan nan nan ...
    temp        (time, level, latc, lonc) float64 nan nan nan nan nan nan ...
Attributes:
    title: North West European Shelf 3nm
    history: Sun Nov 20 00:54:48 2016: ncks -v time,latc,lonc,level,temp,bathymetry,h -d time,0,120,24 -d level,1,25,5 /gpfs/afmcefas/share/nwes/36x36_old/timebase/nwes-3d-199602.nc cefas_GETM_nwes.nc
Mon May  9 18:50:36 2016: ncatted -a units,time,o,c,seconds since 1996-01-01 00:00:00 /gpfs/afmcefas/share/nwes/36x36_old//timebase//n

In [41]:
GETM.dims

Frozen(SortedKeysDict({'level': 5, 'latc': 360, 'lonc': 396, 'time': 6}))

In [56]:
GETM.coords['latc'].shape

(360,)

In [57]:
GETM.attrs.keys()

odict_keys(['title', 'history', 'NCO', 'nco_openmp_thread_number'])

### Extracting a DataArray from a Dataset

In [79]:
temp=GETM['temp']
print(type( temp ))
temp.shape

<class 'xarray.core.dataarray.DataArray'>


(6, 5, 360, 396)

## Accessing data values

In [87]:
temp[0,0,90,100]

<xarray.DataArray 'temp' ()>
array(11.532867431640625)
Coordinates:
    latc     float64 49.9
    level    float64 1.0
    lonc     float64 -9.5
    time     datetime64[ns] 1996-02-01T01:00:00
Attributes:
    units: degC
    long_name: temperature
    valid_range: [ -2.  40.]

## Indexing and selecting data


<br>
<img src="../figures/xarray_indexing_table.png">



In [138]:
print( temp[0,2,:,:].shape )

print( temp.loc['1996-02-02T01:00:00',:,:,:].shape )

print( temp.isel(level=1,latc=90,lonc=100).shape )

print( temp.sel(time='1996-02-02T01:00:00').shape )
#temp.loc

(360, 396)
(5, 360, 396)
(6,)
(5, 360, 396)


### Define selection

In [None]:
### Label 

In [147]:
#GETM.sel(level=1)['temp']
GETM['temp'].sel(level=1,lonc=0.0,latc=50., method='nearest')

<xarray.DataArray 'temp' (time: 6)>
array([ 8.44515419,  8.41235542,  8.44612503,  8.43091679,  8.42808151,
        8.3733902 ])
Coordinates:
    latc     float64 50.0
    level    float64 1.0
    lonc     float64 0.02
  * time     (time) datetime64[ns] 1996-02-01T01:00:00 1996-02-02T01:00:00 ...
Attributes:
    units: degC
    long_name: temperature
    valid_range: [ -2.  40.]

### Calculate average along a dimension

In [54]:
tave=GETM[['temp']].mean('time')

plt.contourf(tave['temp'][0,:,:],50)
plt.colorbar()
plt.show()

Dimension lookup 	Index lookup 	DataArray syntax 	Dataset syntax
Positional 	By integer 	arr[:, 0] 	not available
Positional 	By label 	arr.loc[:, 'IA'] 	not available
By name 	By integer 	arr.isel(space=0) or
arr[dict(space=0)] 	ds.isel(space=0) or
ds[dict(space=0)]
By name 	By label 	arr.sel(space='IA') or
arr.loc[dict(space='IA')] 	ds.sel(space='IA') or
ds.loc[dict(space='IA')]