**Convention in ML to use X and y** - ska motsvara teorin
Specifika bokstäver för skalärer, a_1, a_2 etc

In [2]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv("../Data/ISLR/Advertising.csv", index_col=0)

df


Unnamed: 0,TV,Radio,Newspaper,Sales
1,230.1,37.8,69.2,22.1
2,44.5,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9
...,...,...,...,...
196,38.2,3.7,13.8,7.6
197,94.2,4.9,8.1,9.7
198,177.0,9.3,6.4,12.8
199,283.6,42.0,66.2,25.5


In [7]:
number_of_samples, number_of_features = df.shape[0], df.shape[1]-1 # -1 för sales ej feature, utan label

number_of_samples, number_of_features # 200 samples, 3 features

(200, 3)

In [11]:
# plocka ut x och y
X, y = df.drop("Sales", axis=1),df["Sales"] # axis går att skriva 0 / 1 för row / cols

X.head()

Unnamed: 0,TV,Radio,Newspaper
1,230.1,37.8,69.2
2,44.5,39.3,45.1
3,17.2,45.9,69.3
4,151.5,41.3,58.5
5,180.8,10.8,58.4


In [12]:
y.head(10)

1     22.1
2     10.4
3      9.3
4     18.5
5     12.9
6      7.2
7     11.8
8     13.2
9      4.8
10    10.6
Name: Sales, dtype: float64

### Sklearn - typical steps

<h5> Recept som funkar för många algoritmer</h5>

1. Train|test split, sometimes train|val|test split
    - *sometimes* vill vi justera hyperparametrar för val

2. Scaling sometimes required
    - min-max scaling
    - standardisation
    - ...
    - scale the trainng data
    - scale test data to training data --> avoid data leakage

3. Fit algorithm to training data - model training
    - weights & bias

4. Transform training data, transform test data --> predictions

5. Evaluate

In [14]:
from sklearn.model_selection import train_test_split
# kan hantera ndarray, listor 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((140, 3), (60, 3), (140,), (60,))

## Feature scaling

https://en.wikipedia.org/wiki/Feature_scaling

Normalzation (min-max scaling)

<h2>

 $X' = \frac{X-X_{\min}}{X_{max}-X_{min}}$

</h2>

````
# för varje feature
# tar data, subtraherar minsta värdet, 
# dividerar största med minsta värde,
`````

matris - skalär, *elementvis*

### Feature standardization

<h2>

$X' = \frac{X-\mu}{\sigma}$

</h2>

z-transform med tabell från statistik


In [26]:
## Feature scaling
from sklearn.preprocessing import MinMaxScaler

# instantiate scaler instance/object
scaler = MinMaxScaler()
scaler.fit(X_train) # important - use for training data 

scaled_X_train = scaler.transform(X_train)
scaled_X_test = scaler.transform(X_test)

print(f'{scaled_X_train.min()=}')
print(f'{scaled_X_train.max()=}')
print(f'{scaled_X_test.min()=}')
print(f'{scaled_X_test.max()=}')

# note scaled_X_test.min != 0, scaled_X_test.max != 1
# 0 <= scaled_X_train <= 1
# 0.005964214711729622 <= scaled_X_test <= 1.1302186878727631

scaled_X_train.min()=0.0
scaled_X_train.max()=1.0
scaled_X_test.min()=0.005964214711729622
scaled_X_test.max()=1.1302186878727631


In [28]:
scaled_X_train.shape, scaled_X_test.shape

((140, 3), (60, 3))

## Linear regression

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

Ordinary least squares

In [35]:
from sklearn.linear_model import LinearRegression # OLS, SVD

model_OLS = LinearRegression()
# för linjär regression spelar scaling ingen roll med scaled, men med gradient descent spelar roll (för att hitta rätt steg i gradienterna)
model_OLS.fit(scaled_X_train, y_train)

print(f'Prameter {model_OLS.coef_}') # beta_1, beta_2, beta_3 
print(f'Intercept {model_OLS.intercept_}') # beta_0

# OLS analytical solution

Prameter beta_1, beta_2, beta_3 [13.02832938  9.88465985  0.69237469]
Intercept beta_0 2.741855324852814


## Stochastic gradient descent (SGD)

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html


In [37]:
from sklearn.linear_model import SGDRegressor

model_SGD = SGDRegressor(loss="squared_error", max_iter=1000)

model_SGD.fit(scaled_X_train,y_train) # needs scaled X_train

print(f'Prameter {model_SGD.coef_}') # beta_1, beta_2, beta_3 
print(f'Intercept {model_SGD.intercept_}') # beta_0

Prameter [11.95384258  9.00066576  1.33001393]
Intercept [3.58765849]


## Manual prediction

In [47]:
test_sample_features = scaled_X_test[0].reshape(1,-1)

test_sample_label = y_test.values[0] # label/target
print(test_sample_features, test_sample_label)
print('TV, Radio, Paper, Sales')

[[0.54988164 0.63709677 0.52286282]] 16.9
TV, Radio, Paper, Sales


In [48]:
test_sample_features.shape

(1, 3)

In [50]:
model_OLS.predict(test_sample_features)[0]

16.565396297434837

In [51]:
model_SGD.predict(test_sample_features)[0]

16.590566971608798

In [52]:
X_test.iloc[0].to_numpy()

array([163.3,  31.6,  52.9])

## Evaluation

In [59]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

# 1. predict on test data

y_pred_OLS = model_OLS.predict(scaled_X_test)
y_pred_SGD = model_SGD.predict(scaled_X_test)

 # plocka 5 första för jämförelse
print(y_pred_OLS[:5])
print(y_pred_SGD[:5])
print(y_test[:5].values)

[16.5653963  21.18822792 21.55107058 10.88923816 22.20231988]
[16.59056697 20.809815   21.102645   11.32201419 21.3952419 ]
[16.9 22.4 21.4  7.3 24.7]


In [63]:
mean_OLS = mean_absolute_error(y_test, y_pred_OLS)
mean_SGD = mean_absolute_error(y_test, y_pred_SGD)

mse_OLS = mean_squared_error(y_test, y_pred_OLS)
mse_SGD = mean_squared_error(y_test, y_pred_SGD)

rmse_OLS = np.sqrt(mse_OLS)
rmse_SGD = np.sqrt(mse_SGD)

print(f'{mean_OLS=} \t {mse_OLS=} \t {rmse_OLS=}')
print(f'{mean_SGD=} \t {mse_SGD=} \t {rmse_SGD=}')

mean_OLS=1.511669222454909 	 mse_OLS=3.7967972367152223 	 rmse_OLS=1.9485372043446392
mean_SGD=1.5219557644193458 	 mse_SGD=4.08801013916862 	 rmse_SGD=2.021882820335694


OLS lägre än SGD, så i detta fall välj OLS