# Fitting a Stage-Discharge Rating
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/thodson-usgs/ratingcurve/blob/master/notebooks/segmented-power-law-demo.ipynb)  

## Segmented power-law rating
There are several approaches to fitting a stage-discharge rating curve. 
The first section of this notebook demonstrates the classic approach, 
which is to use a segmented power law.

The segmented power law is defined as:

\begin{align}
    \log(Q) = a + \sum_{i=1}^{n} b_i \log(x - x_{o,i}) H_i(x - x_{o,i})
\end{align}
where
$Q$ is a vector discharge, \
$n$ is the number of breakpoints in the rating, \
$a$ and $b$ are model parameters, \
$x$ is a vector of stage observations, \
$x_o$ is a vector of breakpoints, and \
$H$ is the Heaviside function. 

In a standard linear model $b$ represents the slope of the function with respect the input.
In the segmented power law $b_o$ is the slope and each subsequent $b_i$ are adjustment to the base slope for each segment.

The later part of notebook demonstrates fitting a rating using a cubic spline.

In [None]:
# # Only run this cell to setup Google Colab. It will take a minute.
# %%capture
# # Specific repo version used in this notebook
# !pip install pymc==4.1.1
#
# # Colab needs this
# %env MKL_THREADING_LAYER=GNU
#
# # install ratingcurve library
# !pip install git+https://github.com/thodson-usgs/ratingcurve.git

In [None]:
%load_ext autoreload
%autoreload 2

import pymc as pm
import arviz as az
from ratingcurve.ratingmodel import SegmentedRatingModel, SplineRatingModel

## Load Data

In [None]:
# load tutorial data
from ratingcurve import tutorial
tutorial.list_datasets()

and load a specific dataset.

In [None]:
df = tutorial.open_dataset('green_channel')
df.head()

In [None]:
# plot the data
ax = df.plot.scatter(x='q', y='stage', marker='o')
ax.set_xlabel("Discharge (cfs)")
ax.set_ylabel("Stage (ft)")

## Setup model
Setup a rating model. This make take a minute while the model compiles.

In [None]:
segments = 2
powerrating = SegmentedRatingModel(q=df['q'],
                                   h=df['stage'], 
                                   q_sigma=df['q_sigma'],
                                   segments=segments,
                                   prior={'distribution':'uniform'})

then fit the model using variational inference (this will be slower on the first run). Set the number of iterations `n` such that the model stops after the loss stops decreasing.

In [None]:
with powerrating:
    method = 'advi'
    mean_field = pm.fit(method=method, n=150_000)
    trace = mean_field.sample(5000)

Once fit, we can evaluate the model by plotting the rating curve.

In [None]:
powerrating.plot(trace)

or as a table of the parameters of the power-law model.

In [None]:
# ignore every column after hdi_97%
az.summary(trace, var_names=["w", "a", "sigma", "hs"])

## Exercise
What happens if we choose the wrong number of segments? 
Increase the number of segments by one and rerun the model.

## Simulated Example
This example uses a simulated rating curve, which allows you to test how different sampling schemes affect the rating curve fit.

First, open the `simulated_rating` tutorial dataset.

In [None]:
sim_df = tutorial.open_dataset('simulated_rating')
print('The simulated rating contains {} observations'.format(len(sim_df)))

This rating contains observations of every 0.01 inch. increment in stage, which is much more than we'd have for a natural rating.
Try sampling to `n=15` or `n=30` and see how that affects the model fit.

In [None]:
# subsample the simulated rating curve
n = 30
df = sim_df.sample(n, random_state=12345)

ax = sim_df.plot(x='q', y='stage', color='grey', ls='-', legend=False)
df.plot.scatter(x='q', y='stage', marker='o', color='blue', ax=ax)
ax.set_xlabel("Discharge (cfs)")
ax.set_ylabel("Stage (ft)")

Setup a rating model with 3 segments

In [None]:
segments = 3
powerrating = SegmentedRatingModel(q=df['q'],
                                   h=df['stage'],
                                   q_sigma=None,
                                   segments=segments,
                                   prior={'distribution':'uniform'})

now fit the model using ADVI

In [None]:
with powerrating:
    method = 'advi'
    mean_field = pm.fit(method=method, n=150_000) #increase n as necessary
    trace = mean_field.sample(5000)

and visualize the results.

In [None]:
powerrating.plot(trace, None)

In [None]:
az.summary(trace, var_names=["w", "a", "sigma", "hs"])

ADVI typically underestimates uncertainty; NUTS may give better results but will be substantially slower to fit curves with several segments.

In [None]:
# NUTS example. This may take several minutes to run.
# n = 4
# with powerrating:
#     trace = pm.sample(tune=1500, chains=n, cores=n, target_accept=0.99)
#
# powerrating.plot(trace)

## Spline demo
An alternative to the segmented power law is the natural spline.

In [None]:
import numpy as np

df = tutorial.open_dataset('green_channel')

knots = np.linspace(2, 13, 2)
q = df['q']
h = df['stage']

spline_rating = SplineRatingModel(q=df['q'],
                                  h=df['stage'],
                                  q_sigma=df['q_sigma'],
                                  knots=knots)

In [None]:
# requires fewer iterations than power law
with spline_rating:
    method = 'advi'
    mean_field = pm.fit(method=method, n=50_000)
    trace = mean_field.sample(5000)

spline_rating.plot(trace)

### Spline with simulated data

In [None]:
sim_df = tutorial.open_dataset('simulated_rating')

In [None]:
# subsample the simulated rating curve
n = 30
df = sim_df.sample(n)

ax = sim_df.plot(x='q', y='stage', color='gray', ls='-', legend=False)
df.plot.scatter(x='q', y='stage', marker='o', color='blue', ax=ax)
ax.set_xlabel("Discharge (cfs)")
ax.set_ylabel("Stage (ft)")

In [None]:
n_knots = 6
knots = np.linspace(5, 13, n_knots)

spline_rating = SplineRatingModel(q=df['q'],
                                  h=df['stage'],
                                  knots=knots)

In [None]:
with spline_rating:
    method = 'advi'
    mean_field = pm.fit(method=method, n=80_000)
    trace = mean_field.sample(5000)

spline_rating.plot(trace)

### Excercise 
Splines can give unexpectedly poor results.
For example, try 
`sim_df.sample(n=30, random_state=771)`  