# Parameter Estimation for Ion Exchange and GAC Breakthrough Data

This demonstration will present the steps for setting up and conducting a parameter estimation for the WaterTAP ion exchange (IX) and granular activated carbon (GAC) models. 

Both technologies are potential candidates for treated water impacted by per- and polyfluoroalkyl substances (PFAS). Critical to evaluating sorptive technologies for this purpose is accurate modeling of when the effluent concentration will hit a user (or regulation) defined limit, the "breakthrough concentration". The breakthrough time will dictate when the media is spent and either is regenerated or, like the case with PFAS, disposed, and will have significant implications for the cost of each process. In this demonstration, we will use publicly available data from an Orange County Water District (OCWD) report to obtain calibrated kinetic, mass transfer, and isotherm parameters.The data includes the time of breakthrough in bed volumes (BV) and the effluent concentration. We want to be able to have the model accurately predict the breakthrough time for a specific species and breakthrough concentration.

For this demonstration, we will use the Clark model for IX and the Constant Pattern Homogenous Surface Diffusion Model (CPHSDM) for GAC. Each model contains different parameters relevant to prediction of breakthrough time and is fully documented in the WaterTAP documentation.

<p align="center">
    <img src="ix_gac_parmest/images/report-cover.png" alt="Breakthrough Curve" width="400"/>
</p>

Figure 11 of the OCWD report presents the breakthrough data used in the pilot studies. In total, they evaluated eight different GAC media and four different IX resins.

Due to the lack of breakthrough for PFOS and PFHxS, we only consider the PFOA and PFBS data for the purposes of this demonstration. Of these four species, PFOA is the only one with a limit (10 ng/L).

<p align="center">
    <img src="ix_gac_parmest/images/breakthrough-curves.png" alt="Breakthrough Curve" width="800"/>
</p>

### Step 0: Imports

In [None]:
import yaml
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from IPython.display import clear_output

# from ix_gac_parmest.ix_gac_parmest import plot_curve, filter_data
from ix_gac_parmest.ix_gac_parmest import *

# Preparation

Using parmest requires the creation of an `Estimator` object that accepts a user-created list of `Experiment` objects. In order to create both of these things, there are a few things to prepare before we can do the parameter estimation:

* Cleaned data
* Objective function
* Initial guess
* Build function


# Data

In [None]:
ix_data = pd.read_csv("ix_gac_parmest/data/ix_ocwd_data.csv")
gac_data = pd.read_csv("ix_gac_parmest/data/gac_ocwd_data.csv")

<p align="center">
    <img src="ix_gac_parmest/images/ix_data_raw.png" alt="Breakthrough Curve" width="600"/>
</p>

<p align="center">
    <img src="ix_gac_parmest/images/gac_data_raw.png" alt="Breakthrough Curve" width="600"/>
</p>

# Filtering Data

Data filtering can be done any way.  Filtering and smoothing the dataset can help parmest in converging on parameter values.

In this case, we are filtering the data such that the value is always below or above a certain threshold, and that the breakthrough concentration is generally increasing. Additionally, we ensure that no BV value is negative

In [None]:
curve = 1
curve_data = ix_data[ix_data["curve_id"] == curve].copy()

# Results of filtering IX data
ix_data_filtered = filter_data(curve_data)
fig, ax = plot_curve(ix_data_filtered, x="all_bvs", y="all_cnorms", yleg="All data")
ax.set_title("Filtered IX Data Example")

# Results of filtering GAC data
curve_data = gac_data[gac_data["curve_id"] == curve].copy()
gac_data_filtered = filter_data(curve_data)
fig, ax = plot_curve(gac_data_filtered, x="all_bvs", y="all_cnorms", yleg="All data")
ax.set_title("Filtered GAC Data Example")

# Model Build

A model build function must be developed in order to build the list of `Experiment` that are passed to the `Estimator` object. At a minimum, the build function must return the model. In addition, the build function *should* also:

1. Scale the model
2. Initialize the model
3. Solve the model

It is important that the model that is returned is stable because this it the model that will be used by parmest to do the parameter estimation routine. It is recommended that you test the stability of the model over a range of potential input conditions.

In [None]:
curve = 1
curve_data = ix_data[ix_data["curve_id"] == curve].copy()
species, resin = curve_data.species.iloc[0], curve_data.resin.iloc[0]

initial_guess = {
    "fs.unit.bv_50": 120000,
    "fs.unit.mass_transfer_coeff": 0.2,
    "fs.unit.freundlich_n": 2.0,
}

m = build_ix_ocwd_pilot(species=species, resin=resin, theta_dict=initial_guess)

# Creating a list of Experiments

Pyomo provides a template class `Experiment` to use to create the input list of experiments. Below is an inherited class for this application called `BreakthroughExperiment`. The class must include a `get_labeled_model` method that returns a model that:

* contains an `experiment_outputs` `Suffix` that contains the experimental data
* contains an `unknown_parmeters` `Suffix` that contains the parameters to be estimated
* has the variables (or, parameters) that are being estimated *unfixed*

Inputs for the `BreakthroughExperiment` class below are:

* `data`: the filtered experimental dataset
* `experiment_number`: a unique identifier for the experiment
* `initial_guess`: an initial guess for the thetas
* `build_function`: the model build function 
* `x_label`: name of the input variable. For these two models, this is the relative effluenct concentration (C/C0)
* `y_label`: name of the varible to be predicted. For these two models, this is the breakthrough time in BVs
* `build_kwargs`: dict any keyword arguments required for the build function
* `thetas`: list of strings of the parameters to be estimated

The `get_labeled_model` method proceeds as such:

1. build, scale, and initialize the model via the `build_function`
2. set the effluent concentration to be the experimental value
3. re-initialize and solve the model at the initial guess
4. unfix all the `theta` variables

# Putting it all together

The following two classes `AdsorptionParamEst` put all these steps into one class.

Inputs for the `AdsorptionParamEst` class are:

* `data`: unfiltered DataFrame
* `build_function`: function that will return the model initialized and solved
* `filter_data_function`: function that will return filtered data
* `initial_guess`: dict of initial guess for thetas
* `x_label`: name of the input variable. For these two models, this is the relative effluenct concentration (C/C0)
* `y_label`: name of the varible to be predicted. For these two models, this is the breakthrough time in BVs
* `build_kwargs`: dict any keyword arguments required for the build function
*  `filter_data_kwargs`: dict of any keyword arguments required for the data filter function
* `parmest_kwargs`: dict of any keyword arguments to pass to parmest
 
With those inputs, the class can run a parameter estimation by calling the following methods in this order:

1. `filter_data`: filters the input breakthrough curve
2. `create_experiment_list`: creates a list of `BreakthroughExperiments` to pass to the `Estimator`
3. `run_parmest`: 

In [None]:
def SSE(model):
    """
    Sum of squared error between `experiment_output` model and data values
    """
    SSE_sf = 1e-4
    expr = sum((y - y_hat) ** 2 for y, y_hat in model.experiment_outputs.items())
    return expr * SSE_sf


curve = 1

df = ix_data[ix_data["curve_id"] == curve].copy()

initial_guess = {
    "fs.unit.bv_50": 120000,
    "fs.unit.mass_transfer_coeff": 0.2,
    "fs.unit.freundlich_n": 2.0,
}

build_kwargs = dict(
    species=species,
    resin=resin,
    theta_dict=initial_guess,
)

ix = AdsorptionParamEst(
    data=df,
    build_function=build_ix_ocwd_pilot,
    # obj_function="SSE",
    obj_function=SSE,
    initial_guess=initial_guess,
    filter_data_function=filter_data,
    xlabel="fs.unit.c_norm[PFOA]",
    ylabel="fs.unit.bv",
    thetas=["bv_50", "mass_transfer_coeff", "freundlich_n"],
    build_kwargs=build_kwargs,
)

ix.filter_data()
ix.test_theta(theta_dict=initial_guess)
clear_output(wait=False)
df_ig = ix.test_theta_results.copy()
ix.create_experiment_list()
ix.run_parmest()
clear_output(wait=False)
ix.test_theta()
clear_output(wait=False)
df_theta = ix.test_theta_results.copy()
ix.compute_fit_statistics()

In [None]:
fig, ax = plt.subplots()

ax.plot(
    ix.filtered_data["filtered_y"],
    ix.filtered_data["filtered_x"],
    color="grey",
    marker=None,
    label="Experimental Data",
    alpha=0.45,
)
ax.plot(df_ig[ix.ylabel], df_ig[ix.xlabel], marker=".", label="Initial Guess")
ax.plot(df_theta[ix.ylabel], df_theta[ix.xlabel], marker=".", label="Theta Estimates")

ax.set_ylim([-0.05, 1.05])
ax.set(title="IX Curve 1\nTesting Initial Guess", xlabel="BV", ylabel="C/C0")
ax.grid(visible=True, zorder=0)
ax.legend()

In [None]:
ix_parmest_results = pd.read_csv(f"results/ix_parmest_results.csv")

<p align="center">
    <img src="ix_gac_parmest/images/ix_parmest_results.png" alt="Breakthrough Curve" width="800"/>
</p>

# Repeat for GAC