# Natural Splines

Splines are flexible functions that can be used to fit rating curves.
In fact, the multi-segment power law is a form of linear spline,
but other types of spline can be used as well.
One alternative is the natural spline.
They have the advantage of being very easy (fast) to fit,
but their form is less constrained than the segmented power law.
As a result, natural splines may produce strange results, particularly with small datasets.

In [None]:
%load_ext autoreload
%autoreload 2

import pymc as pm
import arviz as az
from ratingcurve.ratingmodel import PowerLawRating, SplineRating

import numpy as np

from ratingcurve import data
data.list()

##  Green River
In practice splines can work quite well, particularly for simpler ratings.
Here is an example, showing a natural spline fit to the Green River dataset.

In [None]:
df = data.load('green channel')

spline_rating = SplineRating(q=df['q'],
                             h=df['stage'],
                             q_sigma=df['q_sigma'],
                             df=8)

In [None]:
# converges much faster than the power law
trace = spline_rating.fit(n=70_000)
spline_rating.plot(trace)

## Simulated Rating

In [None]:
sim_df = data.load('3-segment simulated')

# subsample the simulated rating curve
n = 30
df = sim_df.sample(n)

ax = sim_df.plot(x='q', y='stage', color='gray', ls='-', legend=False)
df.plot.scatter(x='q', y='stage', marker='o', color='blue', ax=ax)
ax.set_xlabel("Discharge (cfs)")
ax.set_ylabel("Stage (ft)")

In [None]:
spline_rating = SplineRating(q=df['q'],
                             h=df['stage'],
                             df=10)

In [None]:
with spline_rating:
    mean_field = pm.fit(method='advi', n=100_000)
    trace = mean_field.sample(5000)

spline_rating.plot(trace)


### Excercise 
Splines can give unexpectedly poor results.
For example, try 
`sim_df.sample(n=30, random_state=771)`  