# Data Preparation

Data description include source, size, type, attributes, modality, etc. Data retrieval from community data centers, personal cloud storage, or published datasets. Feature extraction and engineering.

Model training data uses data from the following sources National Resource Conservation Service (NRCS) Snow Telemetry (SNOTEL), California Data Exchange Center (CDEC), and Copernicus 90-m DEM.
In addition to these data sources, we create feature based upon the water year weak, latitude, longitude, and elevation.
From these data sources we created the following features for input into the machine learning models.

|Feature id| Description|
 |:-----------: | :--------: |
 |WY Week | Numerical ID of the week of the water year|
 |Latitude | Center latitude of the training grid cell |
 |Longitude | Center longitude of the training grid cell|
 |Elevation | DEM elevation of the training grid cell |
 |Northness | Calculated northness of the training cell|
 |Previous SWE | Model predicted SWE from previous week|
 
 ## SNOTEL and CDEC Snow Monitoring Data
 
 <img align = 'right' src="Images/SNOTEL.jpg" alt = 'drawing' width = '400'/>
SNOTEL is an automated system of snowpack and related climate sensors operated by the Natural Resources Conservation Service (NRCS) that record key snow and hydrometeorological components and transmit observations via a telemetry netrwork. 
There are over 800 SNOTEL sites distributed accross the western US and Alaska, and have become standard climate information to understand snowpack dynamics and estimate water supply. 
Standard SNOTEL sites include a pressure sensing snow pillow to measure SWE based on hydrostatic pressure created by overlying snow, snow depth, storage precipitation gauge, and air temperature sensor.  
System data-loggers provide functions for computing daily maximum, minimum, and average temperature information from data recorded every 15 minutes. 

SNOTEL central computer receives all monitoring station information and transmits to the Centralized Forecasting System (CFS). 
The CFS maintains the data is in a relational database, supporting analysis and graphics programs for data analytics.
The system supports data retreival of current and historical data for analyses.




In [1]:
#required modules

import numpy as np
import pandas as pd
import ulmo
from datetime import timedelta

def get_SNOTEL(sitecode, start_date, end_date):

    #This is the latest CUAHSI API endpoint
    wsdlurl = 'https://hydroportal.cuahsi.org/Snotel/cuahsi_1_1.asmx?WSDL'

    #Daily SWE
    variablecode = 'SNOTEL:WTEQ_D'

    values_df = None
    try:
        #Request data from the server
        site_values = ulmo.cuahsi.wof.get_values(wsdlurl, sitecode, variablecode, start=start_date, end=end_date)
        end_date=end_date.strftime('%Y-%m-%d')
        #Convert to a Pandas DataFrame   
        SNOTEL_SWE = pd.DataFrame.from_dict(site_values['values'])
        #Parse the datetime values to Pandas Timestamp objects
        SNOTEL_SWE['datetime'] = pd.to_datetime(SNOTEL_SWE['datetime'], utc=True)
        #Set the DataFrame index to the Timestamps
        SNOTEL_SWE = SNOTEL_SWE.set_index('datetime')
        #Convert values to float and replace -9999 nodata values with NaN
        SNOTEL_SWE['value'] = pd.to_numeric(SNOTEL_SWE['value']).replace(-9999, np.nan)
        #Remove any records flagged with lower quality
        SNOTEL_SWE = SNOTEL_SWE[SNOTEL_SWE['quality_control_level_code'] == '1']

        SNOTEL_SWE['station_id'] = sitecode
        SNOTEL_SWE.index = SNOTEL_SWE.station_id
        SNOTEL_SWE = SNOTEL_SWE.rename(columns = {'value':end_date})
        col = [end_date]
        SNOTEL_SWE = SNOTEL_SWE[col].iloc[-1:]
        
    except:
        print('Unable to fetch SWE data for site ', sitecode, 'SWE value: -9999')
        end_date=end_date.strftime('%Y-%m-%d')
        SNOTEL_SWE = pd.DataFrame(-9999, columns = ['station_id', end_date], index =[1])
        SNOTEL_SWE['station_id'] = sitecode
        SNOTEL_SWE = SNOTEL_SWE.set_index('station_id')


    return SNOTEL_SWE

Code for retreiving SNOTEL data. 
Data retreival takes in the appropriate url, site code, start date, and end date. 
For the machine learning model, SNOTEL data retreival is set to grab the most recent daily observation.


In [None]:
#This is the latest CUAHSI API endpoint
wsdlurl = 'https://hydroportal.cuahsi.org/Snotel/cuahsi_1_1.asmx?WSDL'

#Daily SWE
variablecode = 'SNOTEL:WTEQ_D'

values_df = None

#Request data from the server
site_values = ulmo.cuahsi.wof.get_values(wsdlurl, sitecode, variablecode, start=start_date, end=end_date)


Find a SNOTEL site of interest

In [2]:
#Snowbird SNOTEL site
sitecode= 'SNOTEL:766_UT_SNTL'
#Get observational data for one day
end_date= pd.to_datetime('today')
start_date= end_date - timedelta(days=7)

#set time to year -  month - day
#end_date = end_date.strftime('%Y-%m-%d')
#start_date = start_date.strftime('%Y-%m-%d')


get_SNOTEL(sitecode, start_date, end_date)

Unnamed: 0_level_0,2023-01-20
station_id,Unnamed: 1_level_1
SNOTEL:766_UT_SNTL,33.7


In [None]:
def get_CDEC(self, station_id, sensor_id, resolution, start_date, end_date ):

    try:
        url = 'https://cdec.water.ca.gov/dynamicapp/selectQuery?Stations=%s' % (station_id)+'&SensorNums=%s' % (sensor_id)+'&dur_code=%s'% (resolution) +'&Start=%s' % (start_date) + '&End=%s' %(end_date) 
        CDEC_SWE = pd.read_html(url)[0]
        CDEC_SWE['station_id'] = 'CDEC:'+station_id
        CDEC_SWE = CDEC_SWE.set_index('station_id')
        CDEC_SWE = pd.DataFrame(CDEC_SWE.iloc[-1]).T
        col = ['SNOW WC INCHES']
        CDEC_SWE = CDEC_SWE[col]
        CDEC_SWE=CDEC_SWE.rename(columns = {'SNOW WC INCHES':end_date})

    except:
        print('Unable to fetch SWE data for site ', station_id, 'SWE value: -9999')
        CDEC_SWE = pd.DataFrame(-9999, columns = ['station_id', end_date], index =[1])
        CDEC_SWE['station_id'] = 'CDEC:'+station_id
        CDEC_SWE = CDEC_SWE.set_index('station_id')



    return CDEC_SWE

<b> To the authors' knowledge, there is not currenlty available python package supporting CDEC SWE data retreival. This code takes in the appropriate url, station id, sensor id, temporal resolution, start date, and end date to retrieve the data of interest. </b>