# Exercise 7: NetCDF4 Basics

## Aim: Introduce the netCDF4 library in Python to read and create NetCDF4 Files.

Find the teaching material here: https://unidata.github.io/netcdf4-python/

### Issues covered:
- Importing netCDF4
- Groups, dimensions, variables and attributes
- Writing data and retrieving it from variables

## Creating/opening/closing netCDF files

Q1.
- Import the `netCDF4` library
- Let's create a new NetCDF file called `test.nc` in appending mode (`a`) with the `NETCDF4` format. This mode will allow us to edit the dataset later. Save this to a variable called `new_file`.
- Inspect the new file to see what its `data_model` is.

In [1]:
# Step 1: Import netCDF4 library
import netCDF4
# Step 2: Create the new file
new_file = netCDF4.Dataset("data/test.nc", "a", format="NETCDF4")
# Step 3: Check the new file out
new_file.data_model

'NETCDF4'

## Groups, dimensions, variables and attributes

### Groups

Q2. Groups act as containers for variables, dimensions and attributes.
- Add a new group to the dataset we just made called "forecasts".
- Create a new group within forecasts called `model1`.
- List the groups of your dataset using `.groups`
- What happens if you do `group3 = new_file.createGroup("/analyses/model2")`?
- What happens if you do `group4 = new_file.createGroup("analyses")`?

In [2]:
# Step 1: Create a new group called forecasts
group1 = new_file.createGroup("forecasts")
# Step 2: Create a group within forecasts called model1
group2 = new_file.createGroup("/forecasts/model1")
# Step 3: Print out the groups
new_file.groups
# Step 4: Try creating model2 within the analyses group which doesn't exist yet
# It creates the 'analyses' group then adds the 'model2' group to it.
group3 = new_file.createGroup("/analyses/model2")
new_file.groups
# Step 5: Try creating the existing group analyses.
# Nothing - it returns the existing group.
group4 = new_file.createGroup("analyses")
new_file.groups

{'forecasts': <class 'netCDF4._netCDF4.Group'>
 group /forecasts:
     dimensions(sizes): 
     variables(dimensions): 
     groups: model1,
 'analyses': <class 'netCDF4._netCDF4.Group'>
 group /analyses:
     dimensions(sizes): 
     variables(dimensions): 
     groups: model2}

### Dimensions

Q3.
- Create some dimensions for the `new_file` dataset:
    - `time` dimension with unlimited size
    - `level` dimension with unlimited size
    - `lat` dimension with unlimited size
    - `lon` dimension with unlimited size
- Print out the dimensions you just created.
- Check the length of the latitude dimension to make sure it is 0.
- Check that the level dimension is unlimited.
- Let's take a look at an overview using 
```
for dim in new_file.dimensions.values():
    print(dim)
```

In [3]:
# Step 1: Create the new dimensions
time = new_file.createDimension('time', None)
level = new_file.createDimension('level', None)
lat = new_file.createDimension('lat', None)
lon = new_file.createDimension('lon', None)
# Step 2: Print out the dimensions
new_file.dimensions
# Step 3: Check the length of the latitude dimension - it should be 0!
print(len(lat))
# Step 4: Check that the level dimension is unlimited - should be True!
print(level.isunlimited())
# Step 5: Take a look at an overview of our dimensions values.
for dim in new_file.dimensions.values():
    print(dim)

0
True
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'lat', size = 0
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'lon', size = 0


### Variables

Remember that the data types are as follows:
- `f4`: 32-bit floting point 
- `f8`: 64-bit floating point 
- `i4`: 32-bit signed integer 
- `i2`: 16-bit signed integer
- `i8`: 64-bit unsigned integer
- `i1`: 8-bit signed integer
- `u1`: 8-bit unsigned integer
- `u2`: 16-bit unsigned integer
- `u4`: 32-bit unsigned integer
- `u8`: 64-bit unsigned integer
- `S1`: single-character string

Q4.
- Create a scalar variable called `times` with the type set to `f8`.
- Create a scalar variable called `levels` but this time set the type to `np.float64`. (You'll need to import numpy as np)
- Print out the variables using `new_file.variables`. What do you notice about the types?
- Create a variable in the `model2` group we made earlier called `temp`, with the `float64` type and this time give it dimensions: (`time`, `level`, `lat`, `lon`). Print it out.
- Create two values: 
    - `longitudes` with the name `lon`, type `float64` and dimension `lon`
    - `latitudes` with the name `lat`, type `float64` and dimension `lat`

In [4]:
# Step 1: Create the times variable
times = new_file.createVariable('times', 'f8')
# Step 2: Create the levels variable
import numpy as np
levels = new_file.createVariable('levels', np.float64)
# Step 3: Print out the variables
# The types are the same - both float64. Sometimes people will use np.float64 as it is more clear than f8. 
print(new_file.variables)
# Step 4: Create the temp variable within the model2 group.
temp = new_file.createVariable("/analyses/model2/temp", np.float64, ("time", "level", "lat", "lon",))
print(new_file.variables)
# Step 5: Create latitudes and longitudes
longitudes = new_file.createVariable("lon", np.float64, ("lon",))
latitudes = new_file.createVariable("lat", np.float64, ("lat",))

{'times': <class 'netCDF4._netCDF4.Variable'>
float64 times()
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of 9.969209968386869e+36 used, 'levels': <class 'netCDF4._netCDF4.Variable'>
float64 levels()
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of 9.969209968386869e+36 used}
{'times': <class 'netCDF4._netCDF4.Variable'>
float64 times()
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of 9.969209968386869e+36 used, 'levels': <class 'netCDF4._netCDF4.Variable'>
float64 levels()
unlimited dimensions: 
current shape = ()
filling on, default _FillValue of 9.969209968386869e+36 used}


### Attributes

Q5.
- Let's create a global attribute. Create an attribute on the `new_file` object called `.description` with the value `This is a test description.`.
- Let's create a variable attribute. Create an attribute on the `times` variable called `units` and put `hours`.
- Take a look at the attrs on `new_file` using `new_file.ncattrs()`. What does this show?
- To get the name AND description, use the following loop:
```
for name in new_file.ncattrs():
    print(name, ":", getattr(new_file, name))
```
- There is an easier way of doing this - using `new_file.__dict__`. Try it out!

In [5]:
# Step 1: Create the .description attribute
new_file.description = "This is a test description."
# Step 2: Create the units attribute
times.units = "hours"
# Step 3: Look at the new attributes we just made
# This just shows the name of the global attrs. Note it doesn't show the nested attributes.
new_file.ncattrs()
# Step 4: Get the name and description using the loop
for name in new_file.ncattrs():
    print(name, ":", getattr(new_file, name))
# Step 5: Get the name and description as a dict
new_file.__dict__

description : This is a test description.


{'description': 'This is a test description.'}

## Writing data to and receiving data from netCDF variables

Q6. 
- Create an array to populate a new variable `lats` with using `lats = np.arange(-100, 100, 2)` and an array to populate the `lons` variable with using `lons = np.arange(-200, 200, 2)`.
- Print out the `latitudes` and `longitudes` variables we created earlier to see what they look like before we populate them.
- Populate the two variables with our data using `latitudes[:] = lats` and the same for longitudes.
- Print the data out and take a look.

In [6]:
# Step 1: Create the lats and lons arrays
lats = np.arange(-90, 91, 5)
lons = np.arange(-180, 180, 5)
# Step 2: Print the latitudes and longitudes variables
print(latitudes)
print(longitudes)
# Step 3: Populate the latitudes and longitudes variables
latitudes[:] = lats
longitudes[:] = lons
# Step 4: Print the data
print("latitudes =\n{}".format(latitudes[:]))
print("longitudes =\n{}".format(longitudes[:]))

<class 'netCDF4._netCDF4.Variable'>
float64 lat(lat)
unlimited dimensions: lat
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used
<class 'netCDF4._netCDF4.Variable'>
float64 lon(lon)
unlimited dimensions: lon
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used
latitudes =
[-90. -85. -80. -75. -70. -65. -60. -55. -50. -45. -40. -35. -30. -25.
 -20. -15. -10.  -5.   0.   5.  10.  15.  20.  25.  30.  35.  40.  45.
  50.  55.  60.  65.  70.  75.  80.  85.  90.]
longitudes =
[-180. -175. -170. -165. -160. -155. -150. -145. -140. -135. -130. -125.
 -120. -115. -110. -105. -100.  -95.  -90.  -85.  -80.  -75.  -70.  -65.
  -60.  -55.  -50.  -45.  -40.  -35.  -30.  -25.  -20.  -15.  -10.   -5.
    0.    5.   10.   15.   20.   25.   30.   35.   40.   45.   50.   55.
   60.   65.   70.   75.   80.   85.   90.   95.  100.  105.  110.  115.
  120.  125.  130.  135.  140.  145.  150.  155.  160.  165.  170.  175.]


Q7.
- Extend `new_file` to have the dimension `pressure` with size 10.
- Define a 4D variable `temperature` with dimensions (time, pressure, latitude, longitude). Print the shape of the temperature variable to look at the size before populating with data.
- Generate random temperature data for a subset of time and pressure values - start by creating `nlats` and `nlons` equal to the length of the `lat` and `lon` dimensions. Assign random data to `temperature[0:10, 0:3, :, :]` using `np.random.uniform(size=(10,3, nlats, nlons))`.
- After assigning the data, print the shape of the `temperature` variable. Take a look at the size of it now.

In [7]:
import numpy as np

# Step 1: Add the pressure dimension
new_file.createDimension("pressure", 10)

# Step 2: Define the temperature variable
temperature = new_file.createVariable("temperature", "f4", ("time", "pressure", "lat", "lon",))
print("temp shape befpre adding data={}".format(temperature.shape))

# Step 3: Set nlats and nlons to the size of the lat and lon dimensions, then assign data to the temperature variable
nlats = len(new_file.dimensions["lat"])
nlons = len(new_file.dimensions["lon"])
temperature[0:10, 0:3, :, :] = np.random.uniform(size=(10, 3, nlats, nlons))

# Step 4: Print out the temperature variable
print("temp shape after adding data = {}".format(temperature.shape))

temp shape befpre adding data=(0, 10, 37, 72)


temp shape after adding data = (10, 10, 37, 72)


Q8. 
- Define the `pressure` variable with type `f4` and the `pressure` dimension.
- Populate the `pressure` variable with the values [1000, 850, 700, 500, 300, 250, 200, 150, 100, 50].
- Extract the tempearture variable using `temperature = new_file.variables["temperature"]`, the latitudes using `latitudes = new_file.variables["lat"][:]` and the longitudes using `longitudes = new_file.variables["lon"][:]`.
- Use fancy indexing to slice the temperature variable: select times 0, 2 and 4. Index the 2nd, 4th and 7th values of the pressures and select only positive latitudes and longitudes.
- Print the shape of the resulting subset array.

In [8]:
# Step 1: Define the pressure variable
pressure = new_file.createVariable("pressure", "f4", ("pressure",))

# Step 2: Popular the pressure variable
pressure[:] = [1000., 850., 700., 500., 300., 250., 200., 150., 100., 50.]

# Step 3:  Extract temperature, latitudes and longitudes
temperature = new_file.variables["temperature"]
latitudes = new_file.variables["lat"][:]
longitudes = new_file.variables["lon"][:]

# Step 4: Use fancy indexing to slice the temperature variable.
tempdat = temperature[::2, [1, 3, 6], latitudes > 0, longitudes > 0]

# Step 5: Print the subset array.
print("shape of fancy temp slice = {}".format(tempdat.shape))

shape of fancy temp slice = (5, 3, 18, 35)
