# ___

# [ Machine Learning in Geosciences ]

**Department of Applied Geoinformatics and Carthography, Charles University** 

*Lukas Brodsky lukas.brodsky@natur.cuni.cz*



## Sklearn API: Linear and Polynomial regression models exercises

**Task**: fit Sklearn models to the simulated sample data. Evaluate goodnes of fit by quantitative metrics. Plot the input training and testing data together with the fitted model and the underlying function. 

* Exercise 1: Linear regression 

* Exercise 2: Non-linear (polynomial) regression 


"Workflow": 

1/ import needed sklearn methods:  
    `LinearRegression` from `sklearn.linear_model`
    `PolynomialFeatures` from `sklearn.preprocessing`
    `mean_squared_error` from `sklearn.metrics` 

and Matplotlib package (e.g. as plt) 

2/ Plot data 

3/ Train regression model 

4/ Evaluate model on training and testing data sets, MSE (use prediction method) 

5/ Plot the resulting model with samples scatter plot, add plot of underlying function 

6/ Evalaute the result 

In [103]:
# imports 
import numpy as np
np.random.seed(0)

# Scikit-learn
pass 

# Plotting 
pass 

### **Exercise 1:** Linear regression in Sklearn
* Use **`LinearRegression`** model from `sklearn.linear_model`


In [104]:
 # DATA

In [105]:
# Process function 
def fun(x, noise):
    return .6 *  x + (x ** .8) * noise

In [46]:
# Sample training data (simulated) 
n_samples = 20
noise_factor = 0.9
# features
X_train = np.sort(np.random.rand(n_samples))

# targets (references) 
noise_train = np.random.rand(n_samples) * noise_factor
y_train = fun(X_train, noise_train) 
X_train = X_train.reshape(-1, 1)

# Sample testing data from the same underlying function 
n_samples_test = 20
# testin gfeatures
X_test = np.sort(np.random.rand(n_samples_test))

# testing targets 
noise_test = np.random.rand(n_samples_test) * noise_factor
y_test = fun(X_test, noise_test)
X_test = X_test.reshape(-1, 1)

In [106]:
# Plot scatterplots (train and test data with different color), add legend and mark axess
pass

### Train linear regression model

In [107]:
# instantiate model
pass 
# fit model on tarining data 
pass 

In [109]:
# You can check the model parameters 
# model.fit_intercept
# model.coef_

In [110]:
# Evaluate model MSE on training data, use model prediction, print the result
pass 

In [111]:
# Evaluate model on testing data, use model prediction, print the result
pass 

In [115]:
# Plot the linear model with samples scatter plot, add underlying function (X_fun and y_fun)

X_fun = np.linspace(0, 1, 100)
noise_fun = np.ones(100) / 2. * noise_factor
y_fun = fun(X_fun, noise_fun)
# plt.plot(X_fun, y_fun, color="k",  label="Underlying function")

pass 

### **Exercise 2:** Non-linear (polynomial) regression in Sklearn

* Use **`LinearRegression`** model from `sklearn.linear_model`. 
* Transform the input features by **`PolynomialFeatures`** from `sklearn.preprocessing`. Use `fit_transform` method. 


In [116]:
# Sample data
def fun(x, noise):
    return .6 * np.sin(x*6) + x + (x ** .8) * noise

In [117]:
# Sample training data (simulated) 
n_samples = 20
noise_factor = 0.7
# features
X_train = np.sort(np.random.rand(n_samples))

# targets (references) 
noise_train = np.random.rand(n_samples) * noise_factor
y_train = fun(X_train, noise_train) 
X_train = X_train.reshape(-1, 1)

# Sample testing data from the same underlying function 
n_samples_test = 20
# testin gfeatures
X_test = np.sort(np.random.rand(n_samples_test))

# testing targets 
noise_test = np.random.rand(n_samples_test) * noise_factor
y_test = fun(X_test, noise_test)
X_test = X_test.reshape(-1, 1)

In [118]:
# Plot scatterplot (train and test data with different color), add legend 
pass 

In [119]:
# Transform input data to polynomials
degree = 10
# instantiate
pass 
# fit_transform
pass 

In [120]:
# instantiate linear regression model 
pass 
# fit the model 
pass 

In [121]:
# Evaluate: predict (use the polynomial transform data!) for model evaluation
pass

In [122]:
# Evaluate the model on test subset
pass

In [123]:
# Plot the input data together with the model and underlying function 

# Model data
X_plot = np.linspace(0, 1, 100)
plot_polynomial_features = poly.fit_transform(X_plot.reshape(-1, 1))
y_plot = linear_regression.predict(plot_polynomial_features)

# Underlying function data
X_fun = np.linspace(0, 1, 100)
noise_fun = np.ones(100) / 2. * noise_factor
y_fun = fun(X_fun, noise_fun)

pass 