# Locally weighted regression
[Locally weighted](https://en.wikipedia.org/wiki/Local_regression) regression is a very powerful non-parametric model used in statistical learning.

![day97-locally_weighted_regression_1](resource/day97-locally_weighted_regression_1.png)

To explain how it works, we can begin with a linear regression model and [ordinary least squares](https://en.wikipedia.org/wiki/Ordinary_least_squares).

$h(x) = x^T \beta$

$\beta = argmin_\beta\sum_{x,y}(y-x^T\beta)^2$

$\beta = (X^TX)^{-1}X^Ty$

Given a dataset **X, y**, we attempt to find a linear model **h(x)** that minimizes residual sum of squared errors. The solution is given by Normal equations.

Linear model can only fit a straight line, however, it can be empowered by polynomial features to get more powerful models. Still, we have to decide and fix the number and types of features ahead.

Alternate approach is given by locally weighted regression.

$h(x_0) = x_0^T \beta(x_0)$

$\beta(x_0) = argmin_\beta\sum_{x,y}w(x,x_0)(y-x^T\beta)^2$

$w(x,x_0) = e^{-\frac{(x-x_0)^2}{2\tau^2}}$

$\beta[x_0] = (X^TWX)^{-1}X^TWy$

Given a dataset **X, y**, we attempt to find a model **h(x)** that minimizes residual sum of **weighted** squared errors. The weights are given by a kernel function which can be chosen arbitrarily and in my case I chose a Gaussian kernel. The solution is very similar to Normal equations, we only need to insert diagonal weight matrix **W**.

What is interesting about this particular setup? By adjusting meta-parameter τ you can get a non-linear model that is as strong as polynomial regression of any degree.

And if you are interested, you can find an excellent explanation of what kernel really does in [Andrew Ng’s Machine learning course](https://www.youtube.com/watch?v=HZ4cvaztQEs&t=720s).

This time the notebook contains interactive plot. You can adjust meta-parameter $\tau$ and watch in realtime its influence on the model. Have fun!

In [1]:
import numpy as np
from ipywidgets import interact
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import gridplot
from bokeh.io import push_notebook

output_notebook()

## algorithm

In [2]:
def local_regression(x0, X, Y, tau):
    # add bias term
    x0 = np.r_[1, x0]
    X = np.c_[np.ones(len(X)), X]
    
    # fit model: normal equations with kernel
    xw = X.T * radial_kernel(x0, X, tau)
    beta = np.linalg.pinv(xw @ X) @ xw @ Y
    
    # predict value
    return x0 @ beta

In [3]:
def radial_kernel(x0, X, tau):
    return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau * tau))

## data

In [4]:
n = 1000

# generate dataset
X = np.linspace(-3, 3, num=n)
Y = np.log(np.abs(X ** 2 - 1) + .5)

# jitter X
X += np.random.normal(scale=.1, size=n)

## fit & plot models

In [5]:
def plot_lwr(tau):
    # prediction
    domain = np.linspace(-3, 3, num=300)
    prediction = [local_regression(x0, X, Y, tau) for x0 in domain]

    plot = figure(plot_width=400, plot_height=400)
    plot.title.text = 'tau=%g' % tau
    plot.scatter(X, Y, alpha=.3)
    plot.line(domain, prediction, line_width=2, color='red')
    
    return plot

In [6]:
show(gridplot([
    [plot_lwr(10.), plot_lwr(1.)],
    [plot_lwr(0.1), plot_lwr(0.01)]
]))

## interactive model

In [7]:
def interactive_update(tau):
    model.data_source.data['y'] = [local_regression(x0, X, Y, tau) for x0 in domain]
    push_notebook()

domain = np.linspace(-3, 3, num=100)
prediction = [local_regression(x0, X, Y, 1.) for x0 in domain]

plot = figure()
plot.scatter(X, Y, alpha=.3)
model = plot.line(domain, prediction, line_width=2, color='red')
show(plot, notebook_handle=True)

In [8]:
interact(interactive_update, tau=(0.01, 3., 0.01))

interactive(children=(FloatSlider(value=1.5, description='tau', max=3.0, min=0.01, step=0.01), Output()), _dom…

<function __main__.interactive_update(tau)>