# Polynomial Regression

## Sources

- [towardsdatascience.com/polynomial-regression](https://towardsdatascience.com/polynomial-regression-bbe8b9d97491)

## Dataset Creation

To understand the need for polynomial regression, let’s generate some random dataset first.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Image

Reproducable seed data:

In [None]:
np.random.seed(0)

A dataset based on random data:

In [None]:
x = 2 - 3 * np.random.normal(0, 1, 20)
y = x - 2 * (x ** 2) + 0.5 * (x ** 3) + np.random.normal(-3, 3, 20)

Plotting the data to get an idea about it's structure:

In [None]:
plt.scatter(x,y, s=10)
plt.show()

## Linear Regression

Applying linear regression to the dataset helps us to understand how this method applies to a simple dataset.

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

In [None]:
np.random.seed(0)

In [None]:
x = 2 - 3 * np.random.normal(0, 1, 20)
y = x - 2 * (x ** 2) + 0.5 * (x ** 3) + np.random.normal(-3, 3, 20)

In [None]:
x = x[:, np.newaxis]
y = y[:, np.newaxis]

In [None]:
model = LinearRegression()
model.fit(x, y)
y_pred = model.predict(x)

### Linear Regression - $R^2$

In [None]:
r2_score(y, y_pred)

### Linear Regression - RSME

In [None]:
np.sqrt(mean_squared_error(y, y_pred))

In [None]:
plt.scatter(x, y, s=10)
plt.plot(x, y_pred, color='r')
plt.show()

## Underfitting

We can seet that a straight line in this example is not able to capture the pattern of the data points. This is an example of ___underfitting___.

To overcome ___underfitting___, the models complexety must be increased. By addng powers to existing features as new features, we generate a higher order equation based on the original dataset.

<center>
$
Y = \theta_0 + \theta_1x
$
</center>

which is equivalent to

<center>
$
y = f(x) = mx + b
$
</center>

can be transformed to

<center>
$
Y = \theta_0 + \theta_1x + \theta_2x^2
$
</center>

This is still considered to be linear, as the coefficients(/weights) of the features are still linear.

$x^2$ is only a new feature. However the curce that we are fitting is quadratic in nature.

## scikit-learn - Polynomial Features

To convert the original features into their higher order terms we will use the `PolynomialFeatures` class provided by `scikit-learn`. 
Next, we train the model using Linear Regression.

In [None]:
import operator

import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.preprocessing import PolynomialFeatures

In [None]:
np.random.seed(0)

In [None]:
x = 2 - 3 * np.random.normal(0, 1, 20)
y = x - 2 * (x ** 2) + 0.5 * (x ** 3) + np.random.normal(-3, 3, 20)

In [None]:
x = x[:, np.newaxis]
y = y[:, np.newaxis]

In [None]:
polynomial_features= PolynomialFeatures(degree=2)

In [None]:
x_poly = polynomial_features.fit_transform(x)

In [None]:
model = LinearRegression()
model.fit(x_poly, y)
y_poly_pred = model.predict(x_poly)

### Polynomial Regression -  $𝑅^2$

In [None]:
r2_score(y,y_poly_pred)

### Polynomial Regression - RSME

In [None]:
np.sqrt(mean_squared_error(y,y_poly_pred))

In [None]:
# sort the values of x before line plot
sort_axis = operator.itemgetter(0)
sorted_zip = sorted(zip(x,y_poly_pred), key=sort_axis)
x, y_poly_pred = zip(*sorted_zip)

In [None]:
plt.scatter(x, y, s=10)
plt.plot(x, y_poly_pred, color='m')
plt.show()

It is quite clear from the plot that the quadratic curve is able to fit the data better than the linear line.

### Cubic Curve

In [None]:
import operator

import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.preprocessing import PolynomialFeatures

In [None]:
np.random.seed(0)

In [None]:
x = 2 - 3 * np.random.normal(0, 1, 20)
y = x - 2 * (x ** 2) + 0.5 * (x ** 3) + np.random.normal(-3, 3, 20)

In [None]:
x = x[:, np.newaxis]
y = y[:, np.newaxis]

In [None]:
polynomial_features= PolynomialFeatures(degree=3)

In [None]:
x_poly = polynomial_features.fit_transform(x)

In [None]:
model = LinearRegression()
model.fit(x_poly, y)
y_poly_pred = model.predict(x_poly)

### Polynomial Regression -  $𝑅^2$

In [None]:
r2_score(y,y_poly_pred)

### Polynomial Regression - RSME

In [None]:
np.sqrt(mean_squared_error(y,y_poly_pred))

In [None]:
# sort the values of x before line plot
sort_axis = operator.itemgetter(0)
sorted_zip = sorted(zip(x,y_poly_pred), key=sort_axis)
x, y_poly_pred = zip(*sorted_zip)

In [None]:
plt.scatter(x, y, s=10)
plt.plot(x, y_poly_pred, color='m')
plt.show()

## API Links

- [operator.itemgetter](https://docs.python.org/3/library/operator.html#operator.itemgetter)
- [sklearn.preprocessing.PolynomialFeatures](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)