# Polynomial Regression -  quadratic, cubic and 4t order polynomial

We have a data set $(x,y)$ generated by a cubic polynomial model and normal distributed noise:
$y(x) = C_0 + C_1 *x  + C_2 *x^2 + C_3 * x^3 + \epsilon$
with 
- the indenpendent variable $x$,
- the dependent variable $y$,
- the model parameters $C_0$, $C_1$, $C_2$ and $C_3$ and
- normal distributed noise $\epsilon \sim \mathcal{N}(\mu,\,\sigma^{2})$ with $\mu = 0$ and $\sigma = 1$

Problem: 
- $x$ and $y$ are given
- but model structure is not known

We try to fit different polynomial model to the data:
- linear regression 
- quadratic polynom
- cubic polynom 
- 4th order polynom 


https://tex.stackexchange.com/questions/6969/symbol-for-gaussian-distribution/347558

<hr style="border:2px solid gray"> </hr>

2020-06-16 ug

2020-12-04 ug change dataset

<hr style="border:2px solid gray"> </hr>
todo:
- compare the estimated parameters from the different polynomial approaches together with RMSE and r2 score


Reference:

Polynomial Regression
https://towardsdatascience.com/polynomial-regression-bbe8b9d97491

In [None]:
%matplotlib inline

imports

In [None]:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures
import operator

### generate random data-set

numpy.random.rand  - random samples from a uniform distribution over \[0, 1)
(https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html)

numpy.random.randn - Return a sample (or samples) from the “standard normal” distribution (https://numpy.org/doc/stable/reference/random/generated/numpy.random.randn.html)

Fixing random state for reproducibility
np.random.seed(19680801)       
from: https://matplotlib.org/3.2.1/gallery/lines_bars_and_markers/psd_demo.html#sphx-glr-gallery-lines-bars-and-markers-psd-demo-py

numpy.reshape - Gives a new shape to an array without changing its data.
https://numpy.org/doc/stable/reference/generated/numpy.reshape.html

In [None]:
np.random.seed(0)
N = 20 

# Fixing random state for reproducibility
np.random.seed(19680801)      

x = np.linspace(-4,4,num=N)         # independent variable
#x = x.reshape(N,1)

noise = np.random.randn(*x.shape)   # noise with gaussian distribution (standard normal distribution)

# coefficients  y= C0 + C1 *x  + C2 *x^2 + C3 * x^3
C0 = 2.0
C1 = 3.0
C2 = 2.0
C3 = 0.5

y_true_value = C0 + C1*x + C2*x**2 + C3*x**3         

y = y_true_value + noise            # dependent variable 

In [None]:
x.shape, y.shape

### plot dataset

In [None]:
plt.figure(figsize=(15,10))  # width, height
plt.scatter(x,y,s=10)
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)

## linear regression model 

We can see that the straight line is unable to capture the patterns in the data. This is an example of **under-fitting**. This is obvious when computing the RMSE and R²-score of the linear line.

In [None]:
# Model initialization
regression_model = LinearRegression()

# we assume the data has only one axle: x.ndim = y.dim = 2 (x.shape = y.shape = (N,))

# transforming the data to include another axis
if x.ndim == 1:
    x = x[:, np.newaxis]
    y = y[:, np.newaxis]

# Fit the data(train the model)
regression_model.fit(x, y)
# Predict
y_predicted = regression_model.predict(x)

# model evaluation
rmse = mean_squared_error(y, y_predicted)
r2 = r2_score(y, y_predicted)

# printing value
print(f'Slope: {regression_model.coef_}')
print(f'Intercept: {regression_model.intercept_}')
print(f'Root mean squared error RMSE: {rmse}')
print(f'R2 score: {r2}')
print(f'true values: C0:{C0}, C1:{C1}, C2:{C2}, C3:{C3}')


### Plot the values

matplotlib.pyplot.scatter - A scatter plot of y vs. x with varying marker size and/or color.
https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.pyplot.scatter.html

parmeter `s` : float or array-like, shape (n, ), optional
The marker size in points**2. 


In [None]:
# plotting values
plt.figure(figsize=(15,10))  # width, height

# data points
plt.subplot(211)
plt.scatter(x, y, s=10, label='data point')
plt.xlabel('x')
plt.ylabel('y')

# predicted values
plt.plot(x, y_true_value, color='b',label='y_true_value')
plt.plot(x, y_predicted, color='r',label='y_predicted')

plt.grid(True)
plt.legend()


# residuals
plt.subplot(212)

plt.plot(x, noise, marker='.', linestyle='-',color='b',label='original noise')
plt.plot(x, y-y_predicted, marker='.', linestyle='-',color='r',label='y residuals')


plt.grid(True)
plt.legend()

## quadratic polynomial

In [None]:
polynomial_features= PolynomialFeatures(degree=2)
x_poly = polynomial_features.fit_transform(x)

model = LinearRegression()
model.fit(x_poly, y)
y_poly_pred = model.predict(x_poly)

rmse = np.sqrt(mean_squared_error(y,y_poly_pred))
r2 = r2_score(y,y_poly_pred)

# printing value
print(f'coefficient: {model.coef_}')
print(f'Intercept: {model.intercept_}')
print(f'Root mean squared error RMSE: {rmse}')
print(f'R2 score: {r2}')
print(f'true values: C0:{C0}, C1:{C1}, C2:{C2}, C3:{C3}')


# plot
plt.scatter(x, y, s=10)

plt.plot(x, y_poly_pred, color='m')
plt.grid(True)

## cubic polynom

In [None]:
polynomial_features= PolynomialFeatures(degree=3)
x_poly = polynomial_features.fit_transform(x)

model = LinearRegression()
model.fit(x_poly, y)
y_poly_pred = model.predict(x_poly)

rmse = np.sqrt(mean_squared_error(y,y_poly_pred))
r2 = r2_score(y,y_poly_pred)

# printing value
print(f'coefficient: {model.coef_}')
print(f'Intercept: {model.intercept_}')
print(f'Root mean squared error RMSE: {rmse}')
print(f'R2 score: {r2}')
print(f'true values: C0:{C0}, C1:{C1}, C2:{C2}, C3:{C3}')


plt.scatter(x, y, s=10)
# sort the values of x before line plot
plt.plot(x, y_poly_pred, color='m')
plt.grid(True)

## polynom of fourth order

In [None]:
polynomial_features= PolynomialFeatures(degree=4)
x_poly = polynomial_features.fit_transform(x)

model = LinearRegression()
model.fit(x_poly, y)
y_poly_pred = model.predict(x_poly)

rmse = np.sqrt(mean_squared_error(y,y_poly_pred))
r2 = r2_score(y,y_poly_pred)

# printing value
print(f'coefficient: {model.coef_}')
print(f'Intercept: {model.intercept_}')
print(f'Root mean squared error RMSE: {rmse}')
print(f'R2 score: {r2}')
print(f'true values: C0:{C0}, C1:{C1}, C2:{C2}, C3:{C3}')

plt.scatter(x, y, s=10)
plt.plot(x, y_poly_pred, color='m')
plt.show()