# Working with NetCDF data

NetCDF is shorthand for Network Common Data Form and is frequently used to distribute large amounts of array-like data. This notebook explains the basics of the netCDF4 package in Python and shows how an online NetCDF dataset can be downloaded and visualised.

Let's import the required packages first.


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import urllib

import netCDF4 as nc

Let's look at the basics of the netCDF4 package by revisiting the dam bathymetry example once more. We first load the saved surface height data. Note that the `X` and `Y` arrays are two-dimensional, but for a NetCDF file, only one-dimensional arrays of unique x and y values are required. We therefore select one row from `X` and one column from `Y`. The surface heights are stored in a two-dimensional array.

In [None]:
xyz = np.loadtxt("data/bathymetry.txt")

X = xyz[0, :].reshape((98, 135))
Y = xyz[1, :].reshape((98, 135))

xi = X[0, :]
yi = Y[:, 0]
zi = xyz[2, :].reshape((98, 135))


The basis for creating a NetCDF file is a structure known as a Dataset. It defines the dimensions and stores the data in variables. The dimensions are typically time and space, the latter in the form of x and y coordinates like easting and northing (for UTM), or longitude and latitude. After creation of the Dataset, information about the dimensions is added using the `createDimension` method.

In [None]:
fn = 'data/bathymetry.nc'
ds = nc.Dataset(fn, 'w', format='NETCDF4')

time = ds.createDimension(dimname='time', size=None)
northing = ds.createDimension(dimname='northing', size=len(yi))
easting = ds.createDimension(dimname='easting', size=len(xi))

After adding the dimensions, variables are added. Note that the time, northing and easting also appear as variables (now containing their numerical values). Adding variables is done using `createVariable`. The units are set to meters after the variable is created. When the Dataset is closed, the NetCDF file is saved to disk.

In [None]:
times = ds.createVariable(varname='times', datatype='f4', dimensions=('time',))
northings = ds.createVariable(varname='northings', datatype='f4', dimensions=('northing',))
eastings = ds.createVariable(varname='eastings', datatype='f4', dimensions=('easting',))
value = ds.createVariable(varname='surface_heights', datatype='f4', dimensions=('time', 'northing', 'easting',))
value.units = 'm'

northings[:] = yi
eastings[:] = xi

print('var size before adding data', value.shape)

value[0, :, :] = zi

print('var size after adding first data', value.shape)

ds.close()

Let's load the saved file and inspect the variable names. The variables are stored in a dictionary, so the variable names can be displayed using the `keys()` method.

In [None]:
ds = nc.Dataset(fn)
ds.variables.keys()

Finally we plot the data to check if they were saved correctly

In [None]:
xi = ds.variables['eastings'] 
yi = ds.variables['northings']
zi = ds.variables['surface_heights']
zi = zi[0, :, :]

fig, ax = plt.subplots()
pc = ax.pcolor(xi, yi, zi)
plt.colorbar(pc, ax=ax)
ax.set_xlabel("Easting")
ax.set_ylabel("Northing")
ax.set_title("Surface height");

## Retrieving online NetCDF data

Now let's see if we can download a NetCDF file from an online server and plot the data. The details on how to do this may vary depeding on the data repository so the example below just serves as a general guide. The example data can be found <A href="https://www.climatechangeinaustralia.gov.au/en/obtain-data/download-datasets/#Change">here</A> and represent climate change projections released in 2015 (therefore, they are likely to be superseded by newer projections but the key point is to demonstrate the Python code).

The first step is to figure out the url to use. In this case, it can be inferred from the dataset catalogue, which is accessible via <A href="https://dap.nci.org.au/thredds/remoteCatalogService?catalog=https://dapds00.nci.org.au/thredds/catalog/ua6_4/CMIP5/derived/Collections/Projected_Change_Data/catalog.xml">https://dap.nci.org.au/thredds/remoteCatalogService?catalog=https://dapds00.nci.org.au/thredds/catalog/ua6_4/CMIP5/derived/Collections/Projected_Change_Data/catalog.xml</A>.

With the right url, the data can be downloaded to a local file using `urlretrieve` from Python's native `urllib` library. This stores the NetCDF file locally (unlike the `get` method from the `requests` library that we used in earlier sessions).

In [None]:
url = "https://dapds00.nci.org.au/thredds/fileServer/ua6_4/CMIP5/derived/Collections/Projected_Change_Data/Maximum_Temperature/2020-2039/tasmax_Amon_ACCESS1-0_rcp45_r1i1p1_2020-2039-abs-change-wrt-1986-2005-seasavg-clim_native.nc"
local_filename = "tasmax_Amon_ACCESS1-0_rcp45_r1i1p1_2020-2039-abs-change-wrt-1986-2005-seasavg-clim_native.nc"
urllib.request.urlretrieve(url, local_filename);

Just like before, the NetCDF file can be opened using `Dataset` from netCDF4 and the variable names can be displayed as keys.

In [None]:
ds = nc.Dataset(local_filename)
ds.variables.keys()

To get an idea of the data, let's just plot one of the variables in this case `tasmax_annual`

In [None]:
lat = ds.variables['lat']
lon = ds.variables['lon']
z = ds.variables['tasmax_annual']
z = z[0, :, :]

X, Y = np.meshgrid(lat, lon)
fig, ax = plt.subplots()

pc = ax.pcolor(lon, lat, z)
plt.colorbar(pc, ax=ax)
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")
ax.set_title("Projected 2020-2039 max temperature")

***Homework exercise***: Create a 4 row by 3 column figure with displays the projected maximum temperature for each month of the year. *Hint: create a list with the names of the months and use a `for` loop to step over each of the variables. Look up the documentation of the Matplotlib `subplots` function to understand how to access the subplots in a 4x3 figure.*

In [None]:
# Type your code here