<img align = 'center' src="./Images/ML_SWE.jpg" alt = 'image' width = '1000'/>


# Model Training
**Akila Sampath, University of Maryland Baltimore County. I am a GeoSMART participant. My goal is to compare the performance of different ML model algorithms in operational forecast applications.**

As part of the **GeoSMART** Hackweek team project contribution, the **AutoEncoder** algorithm is tested and implemented to predict **SWE**. The following workflow exemplifies the steps and python files to process the training data, train a model, produce predictions, and perform preliminary evaluations.

In [None]:
import os
import DataProcess
import MLP_Model
#Set working directories
cwd = os.getcwd()
os.chdir("..")
os.chdir("..")
datapath = os.getcwd()  

## Model Training and Testing Schema

The motivation the project is to advance the SSM skill for extrapolating regional SWE dynamics from in-situ observations.
To develop and test the SSM, we will train the model on NASA Airborne Snow Observatory (ASO) and snow course observations spanning 2013-2018, and some of 2019.
Within this training dataset, model training will use a random 75-25\% train-test data split. 
The random sample function will be 1234 to ensure all participants models use the same training and testing data for this phase of model development - note, this will support an intermodel comparision.

Model validation will be on water year 2019 and use the [NWM_MLP_2019_Simulation]('./NWM_MLP_2019_Simulation.ipynb').
This historical simulation will function as a hindcast, and use the 2019 water year NASA ASO and snow course observations to determine model performance. 


Upon the completion of model training, model execution predicts 1-km resolution SWE from data up to the current date of observation provided Latitude, Longitude, corresponding topographic data, and neighboring observation input features. From the sampling of test features, Chapter [Evaluation]('./evaluation.ipynb') compares the modeled 1-km grid SWE values to the observed values.

In [None]:
#Define hold out year
HOY = 2019
#Run data processing script to partition key regional dataframes
#note, need to load RegionTrain_SCA.h5,
RegionTrain, RegionTest, RegionObs_Train, RegionObs_Test, RegionTest_notScaled = DataProcess.DataProcess(HOY, datapath, cwd)

## AutoEncoder (AE)


In [None]:
#model training, each participants model will be different but should follow the prescribed input feature template
epochs= 30
MLP_Model.Model_train(cwd, epochs, RegionTrain, RegionTest, RegionObs_Train, RegionObs_Test)

## Make predictions on the random sample of testing data
<img align = 'center' src="./Images/predictivemodeling.jpg" alt = 'image' width = '600'/>

The next phase of model development is to examine model performance on the random sample of testing data.
Refining model predictions at this phase will ensure the best model performance for the Hold-Out-Year validation set.

In [None]:
#Need to create Predictions folder if running for the first time
Predictions = MLP_Model.Model_predict(cwd,  RegionTest, RegionObs_Test, RegionTest_notScaled)

## Perform Preliminary Model Evaluation

How does your model performance? 
We are using simple model evaluation metrics of R2 and RMSE to guage model performance.
You will perform a more exhaustive model evaluation in the [Evaluation]('./evaluation.ipynb') chapter.

In [None]:
Performance = MLP_Model.Prelim_Eval(cwd, Predictions)
Performance

### Model Evaluation

Now that we have a trained model producing acceptable performance, it is time to more rigorously evaluate its performance using the [Standardized Snow Water Equivalent Tool](./SSWEET.py) within an interactive [evaluation notebook](./evaluation.ipynb).