# Highland Lakes Storm Inflow Predictive Model (Preliminary)

_Tyler Carstensen_

##### Diagrams Courtesy   
_Lower Colorado River Authority_  
_United States Geological Survey_  
_Modified Scraping script from Nathan Hilbert (Oak Ridge National Laboratory)_

##### Technologies Used

 * psycopg2 (postgreSQL)
 * BeautifulSoup
 * Selenium (chrome driver)
 * Scipy Stack (Python Scientific Libraries)
 * Scikit-learn
 * Statsmodels


---

# Overview of the highland lakes system

PICTURE OF LAKE TRAVIS   

PICTURES OF DAMS  

![highlandlakes](images/highlandlakes2008.jpg)
![lakeprofile](images/lake_profile_no_data.png)


---

# May 2015 floods end the drought

PICTURES OF FLOODED AUSTIN, WATER OVER LAMAR BRIDGE  (Intersting: Explain that this is JUST lady bird lake (or town lake for you austinites) flooding, which is a small lake. Upstream is lake austin, which has significantly more volume at a slightly higher elevation. Then lake travis, which has an exponentially higher volume at a much higher elevation, then buchanan which has a massive volume at a much higher elevation.

Drought before/after pictures as seen on [Austin-American Statesmans Before and After Project](http://projects.statesman.com/news/lake-travis-levels/)

## Lake Travis During Drought (2012)
![RM620before](images/RM620before.png)

## Lake Travis After Drought (2016)
![RM620after](images/RM620after.png)




Many people (including our politicans and leaders) have forgotton the original mission of the dams - **Flood control**  
The drought is over and now flooding is a renewed threat. Lake capacity acts as a 'buffer' against massive storm inflows from the surrounding basin that would typically flood Austin and the surrounding area.  

Prior to engineered flood control, Travis county and the surrounding area were flooded frequently. Initial attempts to dam the colorado river in the early 1900s failed - City of Austin construction two dams, but both were destroyed by floods.

Massive national civil engineering efforts in the middle of the 20th century were reponsible for the commissioning of the existing series of dams. They were constructed during a decades-long effort starting in 1935 and ending in 1951. The construction involved redirecting rivers and reshaping the terrain to influence the natural watersheds.

**It is critical to know how much the lakes are going to fill after a storm**

---

How does weather affect the lakes?  
PICTURE OF WEATHER STATIONS  
![stormrainflow](images/stormrainflowusgs.gif)


USGS Streamflow post storm image (Shows time delay between water flow and storm, but also how it can overlap)
    (area under the curve after rainfall ends is a good indicator of what's going to happen. May not even need to catch the exact end of the rainfall)
PICTURE OF LAKE INFLOWS POST WEATHER EVENT  
(Weather itself is almost impossible to predict. It's difficult to say which watersheds recieved the rainfall. Streamflow gauges can give us additional data on which watersheds the storm affected.)  

---
Watersheds
PICTURE OF WATERSHEDS
Explain watershed (if you dump your drink anywhere in watershed x, it will always flow to a single point. If the watershed is draiing, it will always move to the edge of the watershed. Of course there are 'local minima', or small watersheds within these large watersheds - but the overall basin should typically drain most water in the same direction.

Water that falls north of highland lake watersheds goes to the brazos river. The Lower Brazos is not controlled.

The watersheds vary greatly in size - Lake Buchanan takes the majority of the rainfall.

---

Historical lake levels  
CHART OF LAKE LEVEL HISTORY  
PICTURES OF LAKE TRAVIS LOW AND HIGH  
 


---

Civil Engineering Considerations on Dams  
DIAGRAM OF DAM  
Spillway to prevent structural damage  
DAM FAILURE PICTURES  
Austin flooding pictures

---
TYPICAL CIVIL ENGINEERING HYDROLOGY METHODS  

## Model Structure

#### _Model 1_  

**Parameters**  
* $h_{prior}$: Time window of precipitation prior to storm event (Hours)  
* $h_{post}$: Time window after storm event (Hours)  
* time_interval: Interval of time to slide the window, affects size of dataset (Hours)  
* event_list: Specific chosen times to place the event window, if time_interval is not used (list of timestamps)  

_Each feature will represent data for a specific time event. The windows will control the aggregated precipitation prior to the event and the aggregated lake inflows after the event. The events can be manually chosen (requires tediously reading precipitation logs) or chosen as arbitrary time intervals._    

A smarter data analysis methodology would parse the features/responses and find the storm events and create a list of storm events. This would require additional work - but may be necessary if the model isn't working well.

NOAA's NWS storm event lists - [NOAA's NWS storm event lists](http://www.spc.noaa.gov/climo/online/)    


**Features**  
 * Rainfall in $h_{prior}$ at each chosen weather station  
 
 PICTURE OF AGGREGATING READINGS IN WEATHER STATION
 
 PICTURE OF WEATHER STATIONS (interesting: we don't care which watershed or lat/long of the stations. The machine learning algorithm should automatically fit coefficients to the features given the lake inflow response variable. It should group the weather stations by the affected watershed. Remember that rainfall on a single weather station may affect more than one watershed.)
 
**Response Variable**  
 * Lake inflows at chosen lake in $h_{post}$ (cubic feet)  
 * BONUS Streamflow prediction (gonna be very hard...)  

## Machine Learning Algorithms  

**One model for each lake in the highland lakes chain**   
 * Linear Regression (multiple)  
     * Regularization
     * Forward-selection for collinear features
 * Randomforest (multiple)  
 
 * BONUS: Nueral Net. This would permit multiple response variables (all lakes) in one model. 


PAPER WILL BE A IPPYNB HTML FILE HOSTED ON GITHUB IO

# Data Conditioning and Pipeline Steps



### 1. Determine which sensors to use by hydromet map.
![watersheds_precipitation_stations](images/watersheds_precipitation_sidebar.png)

 * Check availability of that sensor on [LCRA's Chronhist database](http://hydromet.lcra.org/chronhist.aspx)
     * This is a *manual Labor* step that takes time due to the nature of the site
     * The history site only allows you to pull 180 days of data at a time
     * Used an old scraper from Oak Ridge National Laboratory as a framework to build my scraper

In [None]:
# List of Sensors (Site Name, Site Number)

# Inks, LBJ, Marble Falls and Austin are static pass-through lakes


# lakelevel_sites = [
#                    ("Buchanan Dam", 1995), #Lake Buchanan
#                    ("Inks Dam", 1999),
#                    ("Lake LBJ at 2900 Bridge", 2699),
#                    ("Wirtz Dam", 2958), #Lake LBJ
#                    ("Starcke Dam", 2999),
#                    ("Mansfield Dam", 3963), #Lake Travis
#                    ("Tom Miller Dam", 3999) #Lake Austin
#                   ]

Each lake level site will be an independent model for lake inflows. They will share the same features, however.

# Web Scraping Code Stubs

In [71]:
import requests
import numpy as np
from bs4 import BeautifulSoup

In [1]:
def get_gauge_list(url='http://hydromet.lcra.org/chronhist.aspx'):
    """Returns a list of gauge values/names.
    
    INPUT:
        string  (url)  | URL of the site
        
    OUTPUT:
        list           | list of tuples containing strings in format (gauge number, gauge name)
    """
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')

    dl1 = soup.find(id='DropDownList1')
    allgauges = dl1.find_all("option")

    gaugevalues = []
    gaugenames = []

    for gauge in allgauges:
        gaugevalues.append(gauge.get('value'))
        gaugenames.append(gauge.contents[0])


    return zip(gaugevalues, gaugenames)

In [5]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep
from selenium.webdriver.support.ui import Select
import pandas as pd

In [143]:
driver = webdriver.Chrome()

for gaugevalue, gaugename in zip(gaugevalues, gaugenames):
    driver.get("http://hydromet.lcra.org/chronhist.aspx")
    select = Select(driver.find_element_by_name('DropDownList1'))
    select.select_by_value(gaugevalue)
    print "Gauge Value: {}".format(gaugevalue)
    
    # Get the new options for sensors
    select_sensor = Select(driver.find_element_by_name('DropDownList2'))
    sensor_options = select_sensor.options
    
    for option in sensor_options:
        option.click()
        sensor_value = option.get_attribute("value")
        sensor_name = option.text
        print "Sensor Value: {}\t\t Sensor Name: {}".format(sensor_value, sensor_name)
        sleep(3)
    
    
    
    
    # Will store data in dataframes based on sensor type, by gauge
        sensormap = zip(sensor_value, sensor_name)
        
        
# WHEN FINISHED, TRY BREAKING INTO SUBFUNCTIONS

Gauge Value: 2992


KeyboardInterrupt: 

In [None]:
sensormap = set(sensormap)
sensormap = list(sensormap)

In [93]:
gauge_drop_down.find_elements_by_tag_name("DropDownList1")

[]

In [None]:
driver = webdriver.Chrome()
driver.get("http://hydromet.lcra.org/chronhist.aspx")
select = Select(driver.find_element_by_name('DropDownList1'))
options = select.options
firstoption = options[0]

In [151]:
firstoption.text # name of element

u'Backbone Creek at Marble Falls'

NEXT STEP: verify all stations on chronhist



EDA: Draw storm curves (rain vs lake levels or streamflow) after scraping data

Models: Try a model that discards non-precipitation events. Also try a model that includes precipitation events.

Try a model for lake level delta, and also for raw lake level

# HOST DATASET ON KAGGLE TO OPTIMIZE ML?