# Week 06: Raster and DEM Intro, rasterio and richdem

This reading material inspired by a similar course offered by David Shean, of UW.
https://github.com/UW-GDA/gda_course_2021

## Overview
This week, we are going to cover raster basics.  We will introduce and use `rasterio` and `richdem` to process, analyze and visualize DEMs of Moscow and the Palouse.

## Reading and Tutorials
Please review the following material (especially if you have limited GIS or remote sensing experience), and come to lecture/lab with questions on topics that are unclear, so we can discuss together.  There is some overlap in content, but different presentation of the essential material, so hopefully one or more will work for you:

### Raster basics
* ESRI Documentation (~15 min)
    * https://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/what-is-raster-data.htm
    * https://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/cell-size-of-raster-data.htm
    * https://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/raster-bands.htm
* Data Carpentry Introduction to Raster Data (~15 min)
    * https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

### Rasterio
* https://rasterio.readthedocs.io/en/latest/quickstart.html
* Automating GIS Processes
    * [Reading raster files with Rasterio](https://automating-gis-processes.github.io/site/notebooks/Raster/reading-raster.html)
    * [Visualizing raster layers](https://automating-gis-processes.github.io/site/notebooks/Raster/plotting-raster.html)
    * [Masking / clipping raster](https://automating-gis-processes.github.io/site/notebooks/Raster/clipping-raster.html)
    * [Raster map algebra](https://automating-gis-processes.github.io/site/notebooks/Raster/raster-map-algebra.html)


##### As you have time...  if not this week, then next...
### Multispectral Image and Landsat background 
* EarthLab Section 5: https://www.earthdatascience.org/courses/use-data-open-source-python/multispectral-remote-sensing/
    * Suggested (can skim/read, no need for interactive):
        * Chapter 7: Introduction to Multispectral Remote Sensing Data in Python
        * Chapter 9: Work with Landsat Remote Sensing Data in Python
        * Chapter 11: Calculate Vegetation Indices in Python
    * Optional:
        * Chapter 8
        * Chapter 10
* https://landsat.gsfc.nasa.gov/landsat-8/landsat-8-overview


### Optional only: GDAL
* Parts 1, 2 and 4 of Rob Simmons' "A Gentle Introduction to GDAL":
    * https://medium.com/planet-stories/a-gentle-introduction-to-gdal-part-1-a3253eb96082
    * https://medium.com/planet-stories/a-gentle-introduction-to-gdal-part-2-map-projections-gdalwarp-e05173bd710a
    * https://medium.com/planet-stories/a-gentle-introduction-to-gdal-part-4-working-with-satellite-data-d3835b5e2971
* Optional optional
    * https://live.osgeo.org/en/quickstart/gdal_quickstart.html
    * https://trac.osgeo.org/gdal/wiki/UserDocs/RasterProcTutorial


In [None]:

import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

import rasterio as rio
import rasterio.plot # Necessary to use rio.plot.show()
import richdem as rd

In [None]:
Path('.')

# if Path('../../datasets')
datasets_dir = Path('../../datasets')
print(datasets_dir / 'moscow')


In [None]:
elev_rio = rio.open(datasets_dir / 'moscow' / 'moscow_lidar_elevation' / 'City_of_Moscow_LiDAR.vrt')
# elev_rio = rio.open(datasets_dir / 'moscow' / 'palouse' / 'palouse_hills.tif')

# # Another option, for reading right into richdem
# elev_rd = rd.LoadGDAL('../../datasets/moscow/moscow_lidar_elevation/City_of_Moscow_LiDAR.vrt')

In [None]:
elev_rio

Let's get some basic data about this dataset

In [None]:
elev_rio.meta # We could also have used: elev_rio.profile

We use the `read` function to read in the data that is contained within our raster.
We also need to give the band number of the data (many datasets can contain multiple bands).
The number of bands in a raster is identified by the `count` above.

<div class="alert alert-block alert-warning">

Do you know an example of a raster dataset with multiple bands?  Google for it if necessary.
    </div>

Because rasterio makes significant use of the open source geospatial tool, gdal (https://gdal.org/),
the band numbers for reading raster data start at 1, not at 0, as they ought to in python.
gdal is a part of a lot of different tools, in a lot of different programming languages, not just python, 
so this component of rasterio is pretty odd from the perspective of a pythonist(a/o).

In [None]:
elev = elev_rio.read(1)

The elevation data are in a rectangular grid, and can be viewed as a numpy array

In [None]:
print(type(elev))
print()
elev

Set the elevations less than -1000 m to np.nan.

In [None]:
elev[elev<-1000] = np.nan

In [None]:
elev # Confirm that the replacements took place


<div class="alert alert-block alert-warning">

What are the units of these elevations?  You'll have to refer to the pdf documentation that came with this elevation dataset.

What is the size/shape of this elevation dataset?
</div>

Read the rasterio manual section on Georeferencing: https://rasterio.readthedocs.io/en/latest/topics/georeferencing.html

<div class="alert alert-block alert-warning">

What are the two metadata components that allow us to place a grid of data in space?
Describe in your own language what their purpose is and how they are represented.
</div>

In [None]:
# dir(elev_rio)
# elev_rio.units
print(elev_rio.crs)

elev_rio.transform # The `transform` orients the raster grid in space relative to the crs.
    # The crs contains the units and identifies where the (0, 0) point of the grid ought to be in space.

We see above that this dataset has a Coordinate Reference System EPSG code of 6340.
That is enough information to transform different kinds of data to ensure that they are in the same coordinate
reference system and therefore can be plotted and analyzed together.  Hypothetically, you could then 
look up this epsg to get more information about it, e.g., https://epsg.org/.

But we can get more information about the crs by relying on its "Well Known Text", or `wkt`.

In [None]:
elev_rio.crs

In [None]:
elev_rio.crs.wkt

<div class="alert alert-block alert-warning">

What reference frame and what units are the horizontal coordinates in?
</div>

In [None]:
# Let's use one of rasterio's convenient functions to look at the data
rio.plot.show(elev_rio)

We can use the `transform` field to pull some important information about the raster data, critically, including the 
x and y locations of the raster.


In [None]:
dx = elev_rio.transform[0] # grid resolution in Easting (x)
rast_x = elev_rio.transform[2] + dx * np.arange(elev_rio.width)
dy = elev_rio.transform[4] # grid resolution in Northing (y)
rast_y = elev_rio.transform[5] + dy * np.arange(elev_rio.height)


In [None]:
dx # grid resolution in Easting (x)

Plot the data using basic matplotlib functions, which also allows more customization,
including adding a colorbar

<div class="alert alert-block alert-warning">
How many data points is this dataset?
    
A dataset this large can bog down even a fast computer, when plotting.
    In the cell below, what is the purpose of the `stride` variable?
What happens as you decrease `stride` towards `1` and keep re-running the cell?  Why?
</div>

In [None]:
stride = 50

fig, ax = plt.subplots()
elev_ax = ax.pcolormesh(rast_x[::stride], rast_y[::stride], elev[::stride, ::stride], shading='auto')
plt.colorbar(elev_ax, ax=ax)

### Prep the data for richdem, to analyze the slopes of the DEM

In [None]:
elev_rd = rd.rdarray(elev, no_data=np.nan) # Need to turn the elevations from a numpy array to an rdarray to handle the np.nan values
elev_rd.projection = 'EPSG:6340'
# Be careful!  The richdem geotransform has a different configuration than the rasterio transform.
    # The geotransform must be a tuple.
elev_rd.geotransform = (elev_rio.transform[2],
                        elev_rio.transform[0],
                        elev_rio.transform[1],
                        elev_rio.transform[5],
                        elev_rio.transform[3],
                        elev_rio.transform[4])

In [None]:
# slope = rd.TerrainAttribute(elev_rd, attrib='slope_riserun')
slope = rd.TerrainAttribute(elev_rd, attrib='slope_degrees')

In [None]:
# Return information about the richdem geotransform, to ensure that it looks right
print(elev_rd.geotransform)

In [None]:
# When you just create `slope`, it is an rdarray, but without any transform information
rd.rdShow(slope)

<div class="alert alert-block alert-warning">

### View the "typical" slopes
Flatten the slopes array into a 1D array (google "flatten array numpy") and then plot the slopes as a histogram
(google "histogram matplotlib").  What is the median slope?  What are the 2nd and 98th percentile slopes
    (see `np.percentile`)
What is the maximum slope and where is it? (see the documentation for `np.argmax`)  Can you plot the location
    of this maximum slope on the slope map itself?
    
</div>


In [None]:
# In addition to the raw data itself, these are the other two critical elements that allow any GIS
#     (including these python modules) to interpret the data geospatially
slope.projection = 'EPSG:6340'
slope.geotransform = elev_rd.geotransform # Here, use the geotransform of the original data to be the geotransform of the new raster


<div class="alert alert-block alert-warning">

### Save the slope map as a geotiff
Note that richdem has a function called SaveGDAL that can save a file as a geotiff, 
    provided that it has appropriate projection and transform metadata.
    
Open the new geotiff, along with the original geotiff, in ArcMap, ArcGIS Pro, or QGIS, to confirm that both rasters appear appropriately.
    </div>