<a href="https://colab.research.google.com/github/twaldburger/flood475/blob/master/04_Flood_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Flood prediction
In this last notebook, we apply our model trained in [03_Model_Training.ipynb](https://github.com/twaldburger/flood475/blob/master/03_Model_Training.ipynb) to new data it has not seen before.

Please go through the explanations and code step-by-step and run each code cell.
> **Task:** Task and questions are marked like this. Please try to answer them before proceeding with the next cell.

---
## Preparations
In this first section, we handle all imports and set some variables. We also initialize the connection to GEE.

### Import dependencies
All dependencies required for this notebook are pre-installed in Google Colab. We can therefore just import them.

In [None]:
import ee
import geemap
import geemap.colormaps as cm
import google
import joblib
import numpy as np
from osgeo import gdal
from pathlib import Path

### Define global variables
The cell below defines some global variables.
- `PROJECT_ID` This is you Gee project ID. If you do not remember it, you can go to the [GEE code editor](https://code.earthengine.google.com/) and list your project by clicking on your user symbol in the top-right corner.
- `MODEL_NAME` The name of your model you want to import.

In [None]:
PROJECT_ID = ''    # @param {type: 'string'}
MODEL_NAME = 'flood_prediction_1' # @param {type: 'string'}

### Mount Google Drive
We will use Google Drive to store our preliminary results from GEE because we can mount it to Google Colab and therefore easily write and read data without the need of manually down- and uploading datasets.

**Important!** The cell below mounts your Google Drive to Google Colab and creates a new folder (named _geo475_ee_). This folder will be removed again at the end of the exercise (you can also keep it if you want, of course). **To make sure that we are not deleting any of your personal data, do not change the `data_dir`-variable in the cell below unless you know what you are doing.**

In [None]:
data_dir = Path('/content/gdrive/MyDrive/geo475_ee')

## mount Google Drive to Colab
if not data_dir.parent.exists():
  google.colab.drive.mount('/content/gdrive')

## create output directory for the project
if not data_dir.exists():
  data_dir.mkdir()

### Initialize Google Earth Engine
In the cell below, we connect to GEE using the same apporoach shown in [01_Connecting_to_GEE.ipynb](https://github.com/twaldburger/flood475/blob/master/01_Connecting_to_GEE.ipynb).

In [None]:
google.colab.auth.authenticate_user()
credentials, project_id = google.auth.default()
ee.Initialize(credentials, project=PROJECT_ID)
print(ee.String('Nice! That worked! :-)').getInfo())

### Define GEE data sources
We define the same datasets we used to create our training dataset in [02_Creating_Training_Data.ipynb](https://github.com/twaldburger/flood475/blob/master/02_Creating_Training_Data.ipynb) since we want to extract the same information to feed the model. Note that we no longer require [Global Flood Database v1 (2000-2018)](https://developers.google.com/earth-engine/datasets/catalog/GLOBAL_FLOOD_DB_MODIS_EVENTS_V1#description) since this dataset was only used to define our target variable in model training.

In [None]:
flood_ds = ee.ImageCollection('GLOBAL_FLOOD_DB/MODIS_EVENTS/V1')
elevation_ds = ee.ImageCollection('COPERNICUS/DEM/GLO30')
landcover_ds = ee.ImageCollection("ESA/WorldCover/v200")
precipitation_ds = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY')
flowaccumulation = ee.Image("MERIT/Hydro/v1_0_1").select('upa')

### Define GEE functions
In contrast to our approach when creating the training data, we do not want to retrieve GEE data for individual locations but we want to download data for an area. Since we will do this for multiple datasets, it makes sense to define a function. This function not only downloads data but it also clips and reprojects each dataset to match their bounds and spatial resolution.
> **Task**: Take a look at the code and try to unterstand what is happening. Do you see room for improvement?

In [None]:
def extract_features_from_gee(name, img, aoi, scale, crs='EPSG:4326', clipped_dem=None):
    """
    Clip and rescale a GEE dataset.
    Then extract the result as numpy array along with some metadata.

    Parameters
    ----------
    name : str
        Name of the dataset. This is not the GEE dataset id but just a name for
        your reference.
    img : ee.image.Image or ee.imagecollection.ImageCollection
        Image or ImageCollection from which to clip.
    aoi : ee.geometry.Geometry
        Are of interest to clip ``img`` to.
    scale : float
        Pixel resolution of the output image.
    crs : str, optional
        Coordinate reference system of the output image.
    clipped_dem : ee.image.Image, optional
        Clipped and reprojected DEM. This is used to compute aspect and slope.

    Returns
    -------
    dict
        Dictionary containing the ee.Image, numpy array and metadata.
    """

    print(f"extracting {name}", end='... ')

    try:

        ## clip and rescale the image
        if name=='slope':
            image = ee.Terrain.slope(clipped_dem)
        elif name=='aspect':
            image = ee.Terrain.aspect(clipped_dem)
        else:
          if isinstance(img, ee.image.Image):
              image = img.clip(aoi).reproject(crs=crs, scale=scale)
          elif isinstance(img, ee.imagecollection.ImageCollection):
              image = img.filterBounds(aoi).mosaic().clip(aoi).reproject(crs=crs, scale=scale)
          else:
              raise TypeError(f"Unknown format ({type(img).__name__}).")

        ## create a metadata dictionary
        dct = {
            'name':  name,
            'image': image,
            'scale': image.projection().nominalScale().getInfo(),
        }
        minmax = image.reduceRegion(reducer=ee.Reducer.minMax(), geometry=aoi, scale=scale).getInfo()
        try:
            dct['min'] = min(minmax.values())
            dct['max'] = max(minmax.values())
            dct['vis'] = {'min': dct['min'], 'max': dct['max'], 'palette': cm.palettes.viridis}
        except TypeError: # is raised if min or max is None
            pass

        ## download image as numpy array
        arr = geemap.ee_to_numpy(image, default_value=80)

        ## normalize elevation
        if name=='elevation':
            arr = (arr-arr.min()) / (arr.max()-arr.min())

        dct['array'] = arr
        print('done!')
        return {name: dct}

    except Exception as e:

        raise RuntimeError(f"Caught exception in dataset {name}") from e

### Import the trained model
We now import our model trained in [03_Model_Training.ipynb](https://github.com/twaldburger/flood475/blob/master/03_Model_Training.ipynb) from Google Drive.

In [None]:
model = joblib.load(data_dir/f"{MODEL_NAME}.pkl")

---
## Create flood probability map

### Define the region of interest
We use an interactive map to define for which region we want to create a flood probability map.
> **Task:** Draw a region of interest (ROI) using the _Draw a rectangle_-tool from the tool bar on the left.


In [None]:
Map = geemap.Map()
Map

### Extract features from GEE
We now use our feature extraction function defined above to get all relevant data for the ROI.

In [None]:
## define spatial target resolution
scale = elevation_ds.select('DEM').first().projection().nominalScale().getInfo()

## make sure the roi is not too large
roi = ee.FeatureCollection(Map.draw_features)
n = elevation_ds.select('DEM').mosaic().reduceRegion(reducer=ee.Reducer.count(), geometry=roi.geometry(), scale=scale).getInfo()['DEM']
if n > 250000:
  Map.remove_drawn_features()
  raise ValueError(f"ROI is too large ({n} pixels). Maximum number of pixels allowed is 250000.")

## extract all relevant features
data = {}
data.update(extract_features_from_gee('elevation', elevation_ds.select('DEM'), roi.geometry(), scale))
data.update(extract_features_from_gee('slope', _, roi.geometry(), scale, clipped_dem=data['elevation']['image']))
data.update(extract_features_from_gee('aspect', _, roi.geometry(), scale, clipped_dem=data['elevation']['image']))
data.update(extract_features_from_gee('landcover', landcover_ds.first(), roi.geometry(), scale))
data.update(extract_features_from_gee('upstream_drainage_area', flowaccumulation, roi.geometry(), scale))

## precipitation is different since we need to aggregate to the mean daily max precipitation per year
lst = []
for year in range(2004, 2023):
  daily_max = precipitation_ds.select('precipitation').filterBounds(roi.geometry()).filter(ee.Filter.date(f"{year}-01-01", f"{year+1}-01-01")).max()
  lst.append(daily_max)
daily_max_mean = ee.ImageCollection(lst).mean()
data.update(extract_features_from_gee('daily_max_precipitation', daily_max_mean, roi.geometry(), scale))

### Run the model
We have extracted multiple 2-dimensional datasets but the model expects a single matrix-like input. We therefore need to reshape our data before using it as model input. Since we are working with numpy arrays, it is important that we ensure the same order of features we used to train the model.

We then run the model and reshape the results back to the original dimensions of our ROI.

In [None]:
## reshape 2d arrays to matrix
model_input = np.array([data[name]['array'].flatten() for name in model.feature_names_in_]).T

## run the model
probability = model.predict_proba(model_input)

## reshape model output to the roi dimensions
probability = probability[:, 1].reshape(data['elevation']['array'].shape)

### Visualize the results
In order to visualize our flood probability map with the _geemap_-library, we need to create an _ee.Image_-object from our numpy array. This requires to define a geotransformation so the image can be displayed correctly on a map. The geotransformation of our probability layer is identical to the transoformation of the input data. However, since I have not found an elegant way to extract the geotransformation from a GEE image, I am using a hack where I save the image as GeoTif and then read the transformation using gdal.

In [None]:
## write the probability image to a geotif file
geemap.ee_export_image(data['elevation']['image'], data_dir/f"dem.tif")

## read the geotransformation using gdal
ds = gdal.Open(str(data_dir/'dem.tif'))
gtf = ds.GetGeoTransform()
del ds

## reorder elements to match GEE's order
affine = [gtf[1], gtf[2], gtf[0], gtf[4], gtf[5], gtf[3]]

## create ee.image and metadata for flood probability
image = geemap.numpy_to_ee(np.squeeze(probability).T, crs='EPSG:4326', transform=affine, band_names='proba')
dct = {
    'name': 'flood_probability',
    'image': image,
    'scale': scale,
    'min': probability.min(),
    'max': probability.max(),
    'vis': {'min': probability.min(), 'max': probability.max(), 'palette': cm.palettes.viridis}
}
data['flood_probability'] = dct

We create our final map where we show all input layers as well as the flood probability layer.
> **Task:** Interpret your flood probability map:
1. Does the map make sense?
2. Can you see which features influences the map the most? Is your feeling in accordance with the feature importance plot for your model?
3. Can you spot errors or artifacts? If yes, how can they be explained?
4. Rerun this notebook and look at different ROI.

In [None]:
Map = geemap.Map()
for name in data.keys():
  if name=='flood_probability':
    Map.add_layer(data[name]['image'], data[name]['vis'], name=name, shown=True)
  else:
    Map.add_layer(data[name]['image'], data[name]['vis'], name=name, shown=False)
Map.addLayerControl()
Map.centerObject(roi)
Map

---
## Clean up Google Drive
This was the last notebook of this exercise. Running the cell below removes the project files we created on your Google Drive and unmounts Google Drive from Colab.

In [None]:
## remove temporary jupyter checkpoints from Google Drive
checkpoints = Path(data_dir/'.ipynb_checkpoints')
if checkpoints.exists():
  for f in checkpoints.glob('*'):
    f.unlink()
  checkpoints.rmdir()

## remove all project files from Google Drive
for f in data_dir.glob('*'):
  f.unlink()
data_dir.rmdir()

## unmount Google Drive
google.colab.drive.flush_and_unmount()

---
## Feedback
This is the last part of the exercise and I hope that it was helpful and fun. Since I have done this for the first time, I would much appreciate if you could take 2 minutes to provide me a short feedback.

Please run the cell below to display a Google Form where you can provide your feedback. I will not collect your mail address so the feedback is anonymous.

In [None]:
%%html
<iframe src="https://docs.google.com/forms/d/e/1FAIpQLScg8j6ORkqgWw4QEHpkeOy2PxYKSdgop3PPvaA1_WT54igFIA/viewform?embedded=true" width="640" height="1376" frameborder="0" marginheight="0" marginwidth="0">Wird geladenâ€¦</iframe>