# Predicting St. Vrain Hydrology using KNN Model

This notebooks allows users to utilize a pre-trained K-Nearest Neighbors Model for predicting St. Vrain Hydrology based on other Natural Flow nodes.  

Note: this notebook does not train a new model each time - but instead utilizes a pre-trained Reclamation model distributed with this notebook.

Please press Shift+Enter (at the sametime) to run each code cell in this notebook.

#### The cell below loads in the required Python libraries for running this notebook. If you get an import error please contact Zack to help you (zleady@usbr.gov).

In [1]:
# standard libraries
import os
import pickle
# data read-in / manipulation libraries
import pandas as pd
import numpy as np
# model building libraries
import sklearn
print(f"Check sklearn version: {sklearn.__version__}")
from sklearn.metrics import r2_score
#from sklearn.preprocessing import StandardScaler
#from sklearn.neighbors import KNeighborsRegressor
# graphing libraries
import plotly.graph_objects as go

Check sklearn version: 1.1.2


#### The cells below loads in the pre-trained saved model from the model.pkl file contained in the same folder as this notebook.

In [2]:
# modify this cell only if you have moved the K-NN model file from the default directory
model_filepath = r"" # <-- user input or default to "./KNNmodel_v1.pkl"
if not model_filepath:
    model_filepath = "./KNNmodel_v1.pkl"

In [3]:
# load saved scikit-learn K-NN Model 
with open(model_filepath, 'rb') as f:
    KNN_UC_model = pickle.load(f)

In [4]:
display(KNN_UC_model)

**The cells below loads in the pre-configured scaling objects.**

SS_predictor is a StandardScaler() which centers the mean at 0 trained on Historical Natural Flow from UpperColoradoRiver.Inflow or GlenWood Natural Flow.

SS_vrain is a StandardScaler() which centers the mean at 0 trained on "Historically" filled Natural Flow from St. Vrain.

In [5]:
with open("./KNN_Predictor_scaler_v1.pkl", 'rb') as f:
    SS_predictor = pickle.load(f)

In [6]:
with open("./KNN_Stvrain_scaler_v1.pkl", 'rb') as f:
    SS_vrain = pickle.load(f)

In [7]:
display(SS_predictor)
display(SS_vrain)

*The input data will use the SS_predictor to be scaled before feeding into the KNN model. The KNN model then makes a prediction on the scaled predictor value and predicts into the scaled SS_vrain data space. Therefore the prediction from the model must be inverse scaled by the SS_vrain from the predicted scaled St. Vrain space into the historical St. Vrain data space that can actually be used by CRSS.*

**Warning if your input data is too far out of the historical data range or CMIP5 then you may experience very poor performance**

#### The cells below read-in the user hydrology input - in this case it should be UpperColoradoRiver.Inflow Annual (CY) Sums of Intervening Natural Flow values

In [8]:
input_hydrology_filepath = r"./test_nf_input.xlsx" # <---user must fill-in
# if you use an excel file please provide the sheetname
sheetname = "Sheet1" # <---user must fill-in if they use an Excel sheet file with more than 1 sheet

In [9]:
# read-in user data
# user data should be formatted as an *.csv or *.xlsx file - *.csv is preferred
if input_hydrology_filepath:
    ext = os.path.basename(input_hydrology_filepath).split(".")[1]
    if ext == "csv":
        input_df = pd.read_csv(input_hydrology_filepath, header=0, index_col=0)
    elif ext == "xlsx":
        input_df = pd.read_excel(input_hydrology_filepath, sheet_name=sheetname, header=0, index_col=0)
    else:
        print("Warning you must use a csv or xlsx file - macros are not supported")

In [10]:
display(input_df)

Unnamed: 0_level_0,NF_1,NF_2,NF_3
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1906,2763072,2763072,2763072
1907,3081437,3081437,3081437
1908,1649464,1649464,1649464
1909,3215703,3215703,3215703
1910,1847184,1847184,1847184
...,...,...,...
2016,2106977,2106977,2106977
2017,2181450,2181450,2181450
2018,1474316,1474316,1474316
2019,2550862,2550862,2550862


#### In order to use the K-NN model correctly the data must be Standard-scaled from scikit-learn (mean = 0)

In [11]:
# values must be scaled using StandardScaler of the Predictor
SS_df = pd.DataFrame(columns=input_df.columns, index=input_df.index)
for col in input_df.columns:
    select_col = input_df.loc[:, col].values.reshape(-1,1)
    scale_col = SS_predictor.transform(select_col)
    new_col = scale_col.reshape(scale_col.shape[0])
    SS_df[col] = new_col

display(SS_df)

Unnamed: 0_level_0,NF_1,NF_2,NF_3
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1906,1.143895,1.143895,1.143895
1907,1.708980,1.708980,1.708980
1908,-0.832714,-0.832714,-0.832714
1909,1.947297,1.947297,1.947297
1910,-0.481769,-0.481769,-0.481769
...,...,...,...
2016,-0.020647,-0.020647,-0.020647
2017,0.111540,0.111540,0.111540
2018,-1.143594,-1.143594,-1.143594
2019,0.767231,0.767231,0.767231


#### The cells below run KNN model predictions from UpperColoradoRiver.Inflow Annual Natural Flow to St. Vrain Annual Natural Flow

In [12]:
# predict values
trace_dict = {}
for trace_col in SS_df.columns.tolist():
    trace_vals = SS_df[trace_col].values
    trace_valsr = trace_vals.reshape(-1, 1)
    print(f"Input original shape: {trace_vals.shape}, Input reshaped for model: {trace_valsr.shape}")
    trace_predict = KNN_UC_model.predict(trace_valsr)
    trace_predictr = trace_predict.flatten()
    print(f"Output from model shape: {trace_predict.shape}, Ouput dataframe shape: {trace_predictr.shape}")
    trace_dict[trace_col] = trace_predictr
    print("\n")

# create output dataframe of predicted St.Vrain values
output_stvrain_df = pd.DataFrame.from_dict(trace_dict)

Input original shape: (115,), Input reshaped for model: (115, 1)
Output from model shape: (115, 1), Ouput dataframe shape: (115,)


Input original shape: (115,), Input reshaped for model: (115, 1)
Output from model shape: (115, 1), Ouput dataframe shape: (115,)


Input original shape: (115,), Input reshaped for model: (115, 1)
Output from model shape: (115, 1), Ouput dataframe shape: (115,)




In [13]:
display(output_stvrain_df)

Unnamed: 0,NF_1,NF_2,NF_3
0,0.873931,0.873931,0.873931
1,2.054987,2.054987,2.054987
2,-0.824119,-0.824119,-0.824119
3,1.361289,1.361289,1.361289
4,-1.267867,-1.267867,-1.267867
...,...,...,...
110,-0.190598,-0.190598,-0.190598
111,0.459410,0.459410,0.459410
112,-1.148935,-1.148935,-1.148935
113,0.445116,0.445116,0.445116


#### Before writing out the predicted St. Vrain Annual Natural Flow the data must be Un-Scaled

In [14]:
SS_inverse_vrain = pd.DataFrame(SS_vrain.inverse_transform(output_stvrain_df),
                                columns=input_df.columns,
                                index=input_df.index)
SS_inverse_vrain = SS_inverse_vrain.round(0)
display(SS_inverse_vrain)
#display(input_df)

Unnamed: 0_level_0,NF_1,NF_2,NF_3
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1906,149314.0,149314.0,149314.0
1907,189603.0,189603.0,189603.0
1908,91389.0,91389.0,91389.0
1909,165939.0,165939.0,165939.0
1910,76252.0,76252.0,76252.0
...,...,...,...
2016,113000.0,113000.0,113000.0
2017,135174.0,135174.0,135174.0
2018,80309.0,80309.0,80309.0
2019,134686.0,134686.0,134686.0


#### Write out St. Vrain Annual Natural Flow Predictions to CSV or Excel (you can skip)

In [15]:
# write out un-scaled output dataframe of predicted St. Vrain values
user_output_filepath = "./stvrain_predicted_nf.csv" # <-- user input
ext = os.path.basename(user_output_filepath).split(".")[1]
if ext == "csv":
    SS_inverse_vrain.to_csv(user_output_filepath)
elif ext == "xlsx":
    SS_inverse_vrain.to_excel(user_output_filepath)
else:
    print("Warning output filepath must be a *.csv or *.xlsx path")

display(f"Wrote output dataframe to {user_output_filepath}")

'Wrote output dataframe to ./stvrain_predicted_nf.csv'

#### Write out St. Vrain Annual Natural Flow Predictions in RiverWare DMI format (you can skip)

In [16]:
# write out un-scaled output dataframe of predicted St. Vrain values using RW format
scenario_output_dirname = r".\StressTest" # <-- user input
stvrain_filename = "TMD_East_Slope_Supply.St_Vrain_Annual_Flow" # <-- user input
start_date_str = "start_date: 2024-12-31 24:00" # <-- user input
units_str = "units: acre-ft" # <-- user input

if not os.path.isdir(scenario_output_dirname):
    os.mkdir(scenario_output_dirname)
for indx, col in enumerate(SS_inverse_vrain.columns.tolist()):
    col_to_folder_rename = f"trace{indx+1}"
    print(f"Column {col} in SS_inverse_vrain is mapped to the folder {col_to_folder_rename}")
    write_date = SS_inverse_vrain[col].values
    temp_dirpath = os.path.join(scenario_output_dirname, col_to_folder_rename)
    if not os.path.isdir(temp_dirpath):
        os.mkdir(temp_dirpath)
    df_filepath = os.path.join(temp_dirpath, stvrain_filename)
    with open(df_filepath, "w+") as f:
        f.write(f"{start_date_str}\n")
        f.write(f"{units_str}\n")
        for v in write_date:
            f.write(f"{round(v,0)}\n")
    print(f"Column {col} written too {df_filepath}")
    print("\n")
        
    
    
    

Column NF_1 in SS_inverse_vrain is mapped to the folder trace1
Column NF_1 written too .\StressTest\trace1\TMD_East_Slope_Supply.St_Vrain_Annual_Flow


Column NF_2 in SS_inverse_vrain is mapped to the folder trace2
Column NF_2 written too .\StressTest\trace2\TMD_East_Slope_Supply.St_Vrain_Annual_Flow


Column NF_3 in SS_inverse_vrain is mapped to the folder trace3
Column NF_3 written too .\StressTest\trace3\TMD_East_Slope_Supply.St_Vrain_Annual_Flow


