# rating-gp (prototype)
`rating-gp` is a prototype model that can fit rating curves (stage-discharge relationship) using a Gaussian process.
This model seeks to expand the typical rating curve fitting process to include shifts in the rating curve with time such that the time evolution in the rating curve can be included in the model.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/thodson-usgs/discontinuum/blob/main/notebooks/rating-gp-demo.ipynb)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/thodson-usgs/discontinuum/main?labpath=notebooks%2Frating-gp-demo.ipynb)

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

%matplotlib inline

Let's select some sites that have a nice variety of rating curves. We will include ones with clear breaks and shifts and others without breaks and minimal shifts. Site number 12413470 is a good example of a rating with a clear break around a stage of 2.7m and very recent and drastic shift. 10131000 is another good example of a rating with a break, but no real shifts in time. 09261000 is a site with no breaks and minimal shifts making it an ideal basic example. Finally, 10154200 also has no prominent breaks, but it does have a very clear shifting with time. Therefore, these four sites should be a nice standard for testing `rating-gp`.

In [None]:
sites = {"12413470": 'SF Coeur D Alene River nr Pinehurst, ID',
         '10131000': 'CHALK CREEK AT COALVILLE, UT',
         '09261000': 'GREEN RIVER NEAR JENSEN, UT',
         '10154200': 'PROVO RIVER NEAR WOODLAND, UT'}

# Select a date range
start_date = "1988-10-01" 
end_date = "2021-09-30"

Now that we have selected our sites, we need to download the data. In `discontinuum`, the convention is to download directly using `providers`, which wrap a data provider's web-service and perform some initial formatting and metadata construction, then return the result as an `xarray.Dataset`. Here, we'll uses the `usgs` provider. If you need data from another source, create a `provider` and ensure the output matches that of the `usgs` provider. Here, we'll download some instantaneous stage data to use as our model's input, and some discharge data as our target. 

In [None]:
import sys
sys.path.insert(0, '../src/')
from rating_gp.providers import usgs

# download instantaneous stage and discharge measurements
training_data_dict = {}
for site in sites:
    training_data_dict[site] = usgs.get_measurements(site=site, start_date=start_date, end_date=end_date)

With the training data, we're now ready to fit a model to each site. Depending on your hardware, this should only take about 20-30s for each site.

In [None]:
# select an engine
from rating_gp.models.gpytorch import RatingGPMarginalGPyTorch as RatingGP

model = {}
for site in sites:
    training_data = training_data_dict[site]
    model[site] = RatingGP()
    model[site].fit(target=training_data['discharge'], covariates=training_data[['stage']], target_unc=training_data['discharge_unc'], iterations=400)

With the models fit, we can generate some nice plots of an observed rating curve and time series of both stage and discharge. We will do this just for site 12413470 to not take up too much room.

In [None]:
fig, ax = plt.subplots(2, 2, figsize=(10, 7), sharex='col', sharey='row')
ax[0, 0] = model[list(sites.keys())[0]].plot_stage(ax=ax[0, 0])
ax[1, 0] = model[list(sites.keys())[0]].plot_discharge(ax=ax[1, 0])
ax[1, 1] = model[list(sites.keys())[0]].plot_observed_rating(ax=ax[1, 1], zorder=3)
_ = ax[0, 1].axis('off')

One major advantage of `rating-gp` is that the resulting model can be used to make predictions of a rating curve at any moment in time. To see how well our model can predict rating curve across time, let's plot the rating curves for each site at an interval of every 5 years at the start of a water year (i.e., October 1st). This way we can see how well the model accounts for any shift in the rating with time. As this will be several plots, let's save each collection of plots for a given date to a page in a pdf for easy checking.

In [None]:
from matplotlib.backends.backend_pdf import PdfPages
import warnings
warnings.filterwarnings('ignore')

n = 250

with PdfPages('time_variable_ratings.pdf') as pdf:
    for j in np.linspace(1990, 2020, 7).astype(int):
        fig, axes = plt.subplots(2, 2, figsize=(10, 10))
        for site, ax in zip(sites, axes.flatten()):
            stage = np.linspace(model[site].dm.data.covariates['stage'].min(),
                                model[site].dm.data.covariates['stage'].max()*1.5,
                                n)
            time = np.repeat(np.datetime64(f"{j}-10-01 00:00:00", 'ns'), n)
            
            ds = xr.Dataset(
                data_vars=dict(
                    stage=(["time"], stage),
                ),
                coords=dict(
                    time=time,
                ),
            )
            
            ax = model[site].plot_rating(ds, ax=ax)
            ax.set_xscale('log')
            ax.set_yscale('log')
            ax.set_title(f'{site}: {sites[site]}')
            
        fig.suptitle(f"{j}-10-01")
        plt.tight_layout()
        pdf.savefig(fig)

Nice! The shifts with time are clearly predicted by the model. These results are are promising for `rating-gp` to be able to model shift in rating curves effectively.