# What is approximating a function?

The goal of this notebook is twice: giving simple example of what is approximating a function but also playing with jupyter, numpy and bokeh.  
Let's start by importing libraries for using jupyter functions such as interact:

In [1]:
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

## Linear regression: a simple example of function approximation

In this example, we will generate our own dataset. Because we would like those data "realistic" as much as possible, we will add some noise.  
Creating your own dataset is very common in data science. It allows you to create "proof of concept" or to pre-train models. It can also increase the amount of data you have.

Then we will apply very basic numpy function to solve regular fucntion approximation. In this case it's linear regression: **we will approximate a linear function**.

One way to do this is to create an estimator. In this case, we are using **M**ean **S**quared **E**rror (MSE). Let's see below:

In [2]:
# We are importing everything we need for bokeh
from bokeh.io import push_notebook, show, output_notebook
from bokeh.layouts import row
from bokeh.plotting import figure
import numpy as np
output_notebook() # this is for showing your plot inside your notebook

In [3]:
@interact(n_points=50, a=0.8, b=(-10,+10,1), ar=0.8, br=(-10,+10,1))
def linear_regression(n_points, a, b, ar, br):
    # This function is generating fake dataset (n_points) based on a linear function (a*x + b = t),
    # adding some noise to make it "real"
    # Approximating a and b
    # Then plotting graphs with bokeh

    t=np.linspace(-5,5,n_points)
    
    x=np.polyval([a,b],t)

    # Add some noise
    xn=x+np.random.randn(n_points)

    # Linear regressison -polyfit - polyfit can be used other orders polys
    
    # With this line, you can calculate automatically ar and br
    #(ar,br)=np.polyfit(t,xn,1)
    
    xr=np.polyval([ar,br],t)

    # Compute the MSE (just for having an idea, it's not used inside polyfit)
    err=np.sqrt(sum((xr-xn)**2)/n_points)

    print('Linear regression using polyfit')
    print('parameters: a=%.2f b=%.2f \nregression: a=%.2f b=%.2f, ms error= %.3f' % (a,b,ar,br,err))

    # Bokeh plotting
    p1 = figure(title='Data without noise', plot_width=400, plot_height=400)
    r1 = p1.circle(t, x, color='green')

    p2 = figure(title='With noise',plot_width=400, plot_height=400)
    r2 = p2.circle(t, xn, color='black')
    r2 = p2.line(t, xr, color='blue', legend='Linear regression')

    plot = show(row(p1,p2), notebook_handle=True)

    return

Linear regression using polyfit
parameters: a=1.00 b=1.00 
regression: a=1.00 b=1.00, ms error= 1.104


Here I choose to create a function because I wanted to use "interact". It's a very useful tool to add more interactivity into your notebooks.  
More information there: http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html

The MSE represent the "average distance" between your data and the line. The goal is to **minimize this distance** so we can consider our model is the "**most accurate**". This concept is called "**loss function**".  

$$\mathscr{L} = \frac{1}{n} \sum_{i=1}^{n}(Y_i-\widehat {Y_i})^2$$  
$Y_i$ is the data (it is known) and $\widehat{Y_i}$ is what we predicted (what we try to be accurate). So in the ideal case, $Y_i-\widehat {Y_i}$ is equal to $0$! Or at least, the closest it is to $0$, the more accurate is our model.  

![MSE concept](images/MSE_concept.png "MSE concept")

This action of minimizing (or maximizing) the loss function is called: optimization. It's a mathematical field itself. So we are saying "we want to optimize our loss function in order to improve our model accuracy". And "improving the accuracy of our model" means "approximating as much as we can the function".

Let's see another example of function approximation.

## Interpolation: another example of function approximation

As before, we're first generating our dataset, based this time on sinus function. As this function is not linear, we cannot use polyfit as before. We will use a technic called "interpolation".

In [4]:
def interpolation(n_samples):
    # This function is generqting dataset based on the sinus function
    # interpolating this function (on n_samples)
    # and plot the data with bokeh
    
    
    # n_points points of sin(x) in [0 10]
    xx = np.linspace(0, 10, 40)
    yy = np.sin(xx)

    # n_samples sample of sin(x) in [0 10]
    x = np.linspace(0, 10, n_samples)
    y = np.sin(x)

    # interpolation
    xvals = np.linspace(0, 2*np.pi, n_samples)
    yinterp = np.interp(xvals, x, y)

    #bokeh plotting
    p1 = figure(title='The function itself', plot_width=400, plot_height=400)
    r1 = p1.line(xx, yy, color='green')

    p2 = figure(title='The approximated function (interpolation)',plot_width=400, plot_height=400)
    r2 = p2.circle(xvals, yinterp, color='black')
    r2 = p2.line(xvals, yinterp, color='blue', legend='Linear regression')

    plot = show(row(p1,p2), notebook_handle=True)
    
    return

In [5]:
interact(interpolation, n_samples = (2, 15, 1))

<function __main__.interpolation>

You can notice that the more we have n_samples, the more our approximated function is accurate. 