# Interpolating and Extrapolating

So - we can now generate models and solve equations, but what if we are measuring data?

First - let's talk about _interpolation_ and _extrapolation_.  

**_Interpolation_** is estimating values between recorded data points.   
**_Extrapolation_** is istimating outside the bounds of the recorded data.  

We can do both of these with python!

First let's generate some example "data" to play with. We are going to make this up, but you could use real data just as easily after loading it with - say - pandas, etc.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
data_x=np.linspace(0, 10, 11)
data_y=np.cos(-data_x**2/9.0)

In [None]:
plt.plot(data_x, data_y, 'ro')

Now let's try to interpolate between the data points. From `scipy` yet again, we can use the `interp1d` function, which generates an "interpolation function object" - basically a function you can pass values to to see what they would be.

In [None]:
from scipy.interpolate import interp1d

In [None]:
f_linear=interp1d(data_x, data_y)

In [None]:
f_linear

Let's make a denser set of x points so we can see how the interpolation is working.

In [None]:
dense_x=np.linspace(0, 10, 51)

Then let's apply the interpolation function

In [None]:
y_linear=f_linear(dense_x)
y_linear

And plot!

In [None]:
plt.plot(data_x, data_y, 'ro')
plt.plot(dense_x, y_linear, 'b-')

Ok - so the basic interp1d just draws a straight line between each data point, and uses that to figure out what a new y-value might be.  This is . . . useful, but one could clearly suspect that it's not accurate.  If we look at what the actual data formula would have generated:

In [None]:
dense_y=np.cos(-dense_x**2/9.0)

In [None]:
plt.plot(data_x, data_y, 'ro')
plt.plot(dense_x, y_linear, 'b-')
plt.plot(dense_x, dense_y, 'r--')

It's ok in certain regions, but in others is way off.  What happens if we try to _extrapolate_ or plot values outside the range of the initial data?

In [None]:
extra_x=np.linspace(-2, 12, 71)
y_linear_extra=f_linear(extra_x) #this will throw an error on purpose

Hmm - threw an error!  Reading the error - it says `A value in x_new is below the interpolation range.` - meaning that we tried to extrapolate.  However, there are _options_ you can set for `interp1d` that will allow the function to extrapolate.  These options are `bounds_error` and `fill_value`.

`bounds_error` controls whether this error is thrown, but if you specify it as `False` you must also specify the `fill_value`.  `fill_value` can either be the values you want to "fill" in when extrapolating as in (.5, -.5):


In [None]:
f_linear_fixed=interp1d(data_x, data_y, bounds_error=False, fill_value=(.5, -.5))
y_linear_fixed=f_linear_fixed(extra_x)

In [None]:
plt.plot(data_x, data_y, 'ro')
plt.plot(extra_x, y_linear_fixed, 'b-')

So it just uses a constant value outside the bounds of the data that was specified. The other option for `fill_value` is to say "extrapolate"

In [None]:
f_linear_extra=interp1d(data_x, data_y, bounds_error=False, fill_value="extrapolate")
y_linear_extra=f_linear_extra(extra_x)

In [None]:
plt.plot(data_x, data_y, 'ro')
plt.plot(extra_x, y_linear_extra, 'b-')

So with extrapolate it just continues the line defined from the last two points.

We can also set the `kind` of interpolation we are doing - the default interpolation is "linear", but we can also do one of a couple different spline fits:

In [None]:
f_zero=interp1d(data_x, data_y, 'zero')
f_slinear=interp1d(data_x, data_y, kind='slinear')
f_cubic=interp1d(data_x, data_y, kind='cubic')

In [None]:
y_zero=f_zero(dense_x)
y_slinear=f_slinear(dense_x)
y_cubic=f_cubic(dense_x)

In [None]:
plt.plot(data_x, data_y, 'ro')
plt.plot(dense_x, y_zero, 'b-')
plt.plot(dense_x, y_slinear, 'r--')
plt.plot(dense_x, y_cubic, 'k-+')
plt.legend(['data', 'Zeroth order', 'first order', 'second order'])

Or we can try `nearest`, `previous` or `next`:

In [None]:
f_nearest=interp1d(data_x, data_y, 'nearest')
f_previous=interp1d(data_x, data_y, kind='previous')
f_next=interp1d(data_x, data_y, kind='next')

In [None]:
y_nearest=f_nearest(dense_x)
y_previous=f_previous(dense_x)
y_next=f_next(dense_x)

In [None]:
plt.plot(data_x, data_y, 'ro')
plt.plot(dense_x, y_nearest, 'b-')
plt.plot(dense_x, y_previous, 'r--')
plt.plot(dense_x, y_next, 'k-+')
plt.legend(['data', 'nearest', 'previous', 'next'])

## Curve fitting

So - we can intperolate between points - but what if we want to fig a curve to the whole function? 

We can use `curve_fit` from `scipy.optimize`

Let's make some noisy "data" to test our fitting:

In [None]:
x=np.linspace(0.0,20.0,51)
A=4.0
B=.5
y=A*np.sin(B*x)+np.random.normal(size=51)

In [None]:
plt.plot(x,y, 'ro')

So with this data, let's try to fit a sin curve.

In [None]:
from scipy import optimize

Now we have to setup the function we want to fit, and what parameters it has:

In [None]:
def myequation(x, a, b):
    y=a*np.sin(x*b)
    return y

In [None]:
param, param_covar = optimize.curve_fit(myequation, x, y, p0=[3, .65])
param

In [None]:
param_covar

In [None]:
y_fit=myequation(x, param[0], param[1])

In [None]:
plt.plot(x,y, 'ro')
plt.plot(x,y_fit, 'b--')

Ok - let's pull in some real data!  From here <https://pdixon.stat.iastate.edu/stat511/datasets.html> we've pulled a tab separated file for you looking at some enzyme kinetics!  

You are probably all familiar with the justifiably famous Michaelis-Menten equation:

$V = \frac{V_{max} [S]}{K_M + [S]}$

that enzyme kinetics follow.  Let's fit our data to equation so we can extract the parameters.

In [None]:
import pandas as pd

In [None]:
enzyme=pd.read_table("../data/enzyme.txt")
enzyme

In [None]:
plt.plot(enzyme.concentration, enzyme.velocity, 'ro')

Now let's set up the equation:

In [None]:
def michaelis_menten(s, v_max, k_m):
    v=v_max*s/(k_m+s)
    return v

In [None]:
enz_param, enz_param_covar = optimize.curve_fit(michaelis_menten, enzyme.concentration, enzyme.velocity)
enz_param

In [None]:
fit_velocity=michaelis_menten(enzyme.concentration, enz_param[0], enz_param[1])

In [None]:
plt.plot(enzyme.concentration, enzyme.velocity, 'ro')
plt.plot(enzyme.concentration, fit_velocity, 'b--')

Pretty nice fit!  So we have used this to figure out what the parameters are for the enzyme.  Of course you could have also generated a Lineweaver-Burk plot - but I think the point is that _if_ you can setup an equation with the parameters, and you have sufficient data, you can try to fit it.

## Exercise:

Now let's look at the pharmacokinetics of cefamandole. Cefamandole is a broad-spectrum antibiotic (no longer used in the US).  This data represents 6 healthy volunteers injected with 15 mg/kg body mass of cefamandole, then the plasma concentrations of the drug were measured at 14 time points.  

Your friendly neighborhood biochemist told you that there are _two_ different rates involved in the degradation of this molecule, renal clearance and non-renal <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC155817/>:

$\frac{d[C]}{dt} = -{renal}[C] - {nonrenal}[C]$

$[C] = C_0 e^{-(renal+nonrenal)t}$

