## Interpolating real data

##### Go through the [Interpolation Guide](https://github.com/tww-carleton/geodac-2022/blob/main/notebooks/InterpolationGuide.ipynb) before this challenge!

A common problem when working with satellite data is that the resolution of the data product may not match with other data or models to which you want to compare. Interpolation, or re-gridding, takes data specified on a particular set of geospatial coordinates and converts it onto another. This is common, but we want the process to be as error-free as possible. 

In [None]:
#importing libraries

import xarray as xr
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# You can use your own data files 

ds  = xr.load_dataset('air.sig995.1948.nc').isel(time=0)
ds

Here, we are showing an alternative way to deal with some NetCDF files: xarray. An xarray dataset already behaves much like a pandas dataframe, and can also easily be plotted.

In [None]:
# Raw data

fig, axes = plt.subplots(ncols=2,figsize=(25, 10))

ds.air.plot(ax=axes[0],robust=True)

axes[0].set_title("Raw data")

# This defines a new longitude and latitude grid, at higher resolution than the original
new_lon = np.linspace(ds.lon[0], ds.lon[-2], ds.dims["lon"] * 5)
new_lat = np.linspace(ds.lat[0], ds.lat[-2], ds.dims["lat"] * 5)

print (new_lat)

# The interp() command interpolates the data onto the new grid
dsi = ds.interp(lat=new_lat, lon=new_lon)

dsi.air.plot(ax=axes[1],robust=True)

axes[1].set_title("Interpolated data")

It can be observed that using spatial interpolation we have reduced the grid size and estimated the air temperature values at finer resolution.

<div class="alert alert-block alert-warning">
    <b>WARNING!</b> This does not mean that we are creating new information. Even though the array might have more elements, the amount of information contained in the original data has been averaged over those new elements. The interp() command assumes that the field between adjacent points is smooth and creates an estimate based on surrounding values.
</div>

Usually if we are comparing two datasets, we would resample the higher resolution product down to the coarser grid.

Consider the following example:

In [None]:
# Old coordinate system

x = np.linspace(240, 300, 100)

z = np.linspace(20, 70, 100)

# Relation between new (lat, lon) and original (x, z) coordinates
lat = xr.DataArray(z, dims=["z"], coords={"z": z})

lon = xr.DataArray(
    (x[:, np.newaxis] - 270) / np.cos(z * np.pi / 180) + 270,
    dims=["x", "z"],
    coords={"x": x, "z": z},
)
fig, axes = plt.subplots(ncols=2, figsize=(10, 4))

ds.air.plot(ax=axes[0])


# draw the new coordinate on the original coordinates.
for idx in [0, 33, 66, 99]:
    axes[0].plot(lon.isel(x=idx), lat, "--k")


for idx in [0, 33, 66, 99]:
    axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), "--k")


axes[0].set_title("Raw data")

dsi = ds.interp(lon=lon, lat=lat)

dsi.air.plot(ax=axes[1])

axes[1].set_title("Remapped data")

### Saving your work 

#### Saving Datasets and DataArrays to NetCDF

Saving your Datasets and DataArrays objects to NetCDF files couldn’t be simpler. The xarray module that we’ve been using to load NetCDF files provides methods for saving your Datasets and DataArrays as NetCDF files.

Here is the manual page on the subjet: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html

The method ._to_netcdf( ) is available to both Datasets and DataArrays objects. 

#### Syntax

*your_dataset.to_netcdf('/your_filepath/your_netcdf_filename.nc')*

In [None]:
dsi.to_netcdf(path='C:\\(add your path)\challenge3_output.nc')
print ('finished saving')

## Try it yourself!

#### 1. Use the data files we used in the last two tasks. Read in the latitude and longitude for each data file, and the nitrogendioxide_tropospheric_column from the TROPOMI file.

    Use interp to put the nitrogendioxide_tropospheric_column TROPOMI data onto the grid from the VIIRS file. Create maps showing the data before and after the interpolation. Save the result to a single NetCDF file, along with both coordinate grids.
    
#### 2. This interp() function is great... if its assumptions are met. What if the pixels in one of your grids aren't all the same size? Or, what if the uncertainty on some of the measurements is much greater than others? You may want to do a weighted average of some kind.

    Have a close look at the boundaries of the TROPOMI data (look at /PRODUCT/SUPPORT_DATA/GEOLOCATIONS/latitude_bounds and longitude_bounds). Are all the TROPOMI pixels the same size? How might you construct a weighted average so that equal areas get equal weight in the re-gridded product?
