## 1. Answer the questions
- Derive an analytical solution to the regression problem. Use a vector form of the equation.
- What changes in the solution when L1 and L2 regularizations are added to the loss function.
- Explain why L1 regularization is often used to select features. Why are there many weights equal to 0 after the model is fit?
- Explain how you can use the same models (Linear regression, Ridge, etc.) but make it possible to fit nonlinear dependencies.

1. y = Xw  
нам необходимо найти самое близкое y' к подпространству образованному векторами-столбцами признаков xi  
То есть ищем проекцию y на x1, ..., xD = y'
Разложим y на y_параллельное + y_перпендикулярное  
y_параллельное = Xw  
y_перпендик. = y -  x - Xw  
значит скалярное произведение y_перпендик. на каждый из столбцов признаков в силу перпендикулярности равно 0:  
XT(y-Xw) = 0 =>  
w = (XT*X)^(-1)*XT*y


2. l1 и l2 регуляризация - штраф, добавляющийся в функцию потерь с целью борьбы с большими весами. Особенно полезно, если в данных присутсвует проблема мультиколлинеарности.  
к лосс функции добавляется в случае l1 - сумма по модулям весов, в случае l2 - сумма квадратов весов, умноженные на коэффициент регуляризации

3. если отрисовать l1-регуляризацию на графике фич, то получится гиперкуб. Его "углы" как правило находятся на осях -> часть других координат нулевая. Проходя по линии функций потерь очень большая вероятность наткнуться как раз на этот самый угол -> часть весов может занулиться.

4. добавить новые фичи на основе существующих - логарифм, синус, полиномы, что угодно

## 2. Introduction — make all the preprocessing staff from the previous lesson

- Import libraries.
- Read Train and Test Parts.
- Preprocess "Interest Level" feature.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

In [2]:
train = pd.read_json("data/train.json")
train.head(3)

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,latitude,listing_id,longitude,manager_id,photos,price,street_address,interest_level
4,1.0,1,8579a0b0d54db803821a35a4a615e97a,2016-06-16 05:55:27,Spacious 1 Bedroom 1 Bathroom in Williamsburg!...,145 Borinquen Place,"[Dining Room, Pre-War, Laundry in Building, Di...",40.7108,7170325,-73.9539,a10db4590843d78c784171a107bdacb4,[https://photos.renthop.com/2/7170325_3bb5ac84...,2400,145 Borinquen Place,medium
6,1.0,2,b8e75fc949a6cd8225b455648a951712,2016-06-01 05:44:33,BRAND NEW GUT RENOVATED TRUE 2 BEDROOMFind you...,East 44th,"[Doorman, Elevator, Laundry in Building, Dishw...",40.7513,7092344,-73.9722,955db33477af4f40004820b4aed804a0,[https://photos.renthop.com/2/7092344_7663c19a...,3800,230 East 44th,low
9,1.0,2,cd759a988b8f23924b5a2058d5ab2b49,2016-06-14 15:19:59,**FLEX 2 BEDROOM WITH FULL PRESSURIZED WALL**L...,East 56th Street,"[Doorman, Elevator, Laundry in Building, Laund...",40.7575,7158677,-73.9625,c8b10a317b766204f08e613cef4ce7a0,[https://photos.renthop.com/2/7158677_c897a134...,3495,405 East 56th Street,medium


In [3]:
test = pd.read_json("data/test.json")

In [4]:
encoder = OneHotEncoder(handle_unknown='ignore', drop='first', sparse_output=False)
interest_level_encoded = encoder.fit_transform(pd.DataFrame(train['interest_level']))

In [5]:
encoded_data = pd.DataFrame(interest_level_encoded, columns=encoder.get_feature_names_out())
df_encoded = pd.concat([train.reset_index(), encoded_data.reset_index()], axis=1)
df_encoded = df_encoded.drop(columns=['interest_level'])

In [6]:
pd.get_dummies(train, columns=['interest_level'], drop_first=True).head(1) ##более легкий способ

Unnamed: 0,bathrooms,bedrooms,building_id,created,description,display_address,features,latitude,listing_id,longitude,manager_id,photos,price,street_address,interest_level_low,interest_level_medium
4,1.0,1,8579a0b0d54db803821a35a4a615e97a,2016-06-16 05:55:27,Spacious 1 Bedroom 1 Bathroom in Williamsburg!...,145 Borinquen Place,"[Dining Room, Pre-War, Laundry in Building, Di...",40.7108,7170325,-73.9539,a10db4590843d78c784171a107bdacb4,[https://photos.renthop.com/2/7170325_3bb5ac84...,2400,145 Borinquen Place,False,True


## 3. Intro data analysis part 2

- Let's generate additional features for better model quality. Consider a column called "Features". It consists of a list of highlights of the current flat.
- Remove unused symbols ([,], ', ", and space) from the column.
- Get all values in each list and collect the result in one huge list for the whole dataset. You can use DataFrame.iterrows().
- How many unique values does a result list contain?
- Let's get acquainted with the new library — Collections. With this package you could effectively get quantity statistics about your data.
- Count the most popular functions from our huge list and take the top 20 for this moment.
- If everything is correct, you should get next values:  'Elevator', 'CatsAllowed', 'HardwoodFloors', 'DogsAllowed', 'Doorman', 'Dishwasher', 'NoFee', 'LaundryinBuilding', 'FitnessCenter', 'Pre-War', 'LaundryinUnit', 'RoofDeck', 'OutdoorSpace', 'DiningRoom', 'HighSpeedInternet', 'Balcony', 'SwimmingPool', 'LaundryInBuilding', 'NewConstruction', 'Terrace'.
- Now create 20 new features based on the top 20 values: 1 if the value is in the "Feature" column, otherwise 0.
- Extend our feature set with 'bathrooms', 'bedrooms' and create a special variable feature_list with all feature names. Now we have 22 values. All models should be trained on these 22 features.

In [7]:
def clear_feature(l):
    symbs = set(["]", "[", "'", '"', " "])
    for i, word in enumerate(l):
        for symb in symbs:
            l[i].replace(symb, "")
    return l

In [8]:
res = []
for i, row in df_encoded.iterrows():
    cleared = clear_feature(row['features'])
    res.extend(cleared)

In [9]:
# кол-во уникальных значений:
len(set(res))

1556

In [10]:
from collections import Counter

res_counter = Counter(res).most_common(20)
res_counter

[('Elevator', 25915),
 ('Cats Allowed', 23540),
 ('Hardwood Floors', 23527),
 ('Dogs Allowed', 22035),
 ('Doorman', 20898),
 ('Dishwasher', 20426),
 ('No Fee', 18062),
 ('Laundry in Building', 16344),
 ('Fitness Center', 13252),
 ('Pre-War', 9148),
 ('Laundry in Unit', 8738),
 ('Roof Deck', 6542),
 ('Outdoor Space', 5268),
 ('Dining Room', 5136),
 ('High Speed Internet', 4299),
 ('Balcony', 2992),
 ('Swimming Pool', 2730),
 ('Laundry In Building', 2593),
 ('New Construction', 2559),
 ('Terrace', 2283)]

In [11]:
res_labels = set(x[0] for x in res_counter)

In [12]:
df = pd.DataFrame(data = [[0]*20 for i in range(len(df_encoded))], columns=list(res_labels))

for i, row in df.iterrows():
    for word in df_encoded['features'][i]:
        if word in res_labels:
            row[word] = 1

In [13]:
df = pd.concat([df_encoded['bathrooms'].reset_index(), df_encoded['bedrooms'].reset_index(), df.reset_index()], axis=1)
df.head()

Unnamed: 0,index,bathrooms,index.1,bedrooms,index.2,Laundry in Building,Dining Room,Outdoor Space,Pre-War,Elevator,...,Laundry in Unit,No Fee,Terrace,Cats Allowed,Balcony,Laundry In Building,Swimming Pool,Roof Deck,Hardwood Floors,New Construction
0,0,1.0,0,1,0,1,1,0,1,0,...,0,0,0,1,0,0,0,0,1,0
1,1,1.0,1,2,1,1,0,0,0,1,...,0,1,0,0,0,0,0,0,1,0
2,2,1.0,2,2,2,1,0,0,0,1,...,1,0,0,0,0,0,0,0,1,0
3,3,1.5,3,3,3,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,1.0,4,0,4,1,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


In [14]:
df = df.drop(columns=['index'])

In [15]:
feature_list = list(df.columns)

## Models implementation — Linear regression

- Initialize the random number generator with a seed of 21.
- Implement a Python class for a linear regression algorithm with two basic methods — fit and predict. Use stochastic gradient descent (SGD) to find optimal model weights. For better understanding, we recommend implementing separate versions of the algorithm with the analytical solution and non-stochastic gradient descent under the hood.
- What is determenistic model? Make SGD determenistic.
- Define the R squared (R2) coefficient and implement a function to calculate it.
- Make predictions with your algorithm and estimate the model with MAE, RMSE and R2 metrics.
- Initialize LinearRegression() from sklearn.linear_model, fit the model, and predict the training and test parts as in the previous lesson.
- Compare the quality metrics and make sure the difference is small (between your implementations and sklearn).
- Store the metrics as in the previous lesson in a table with columns model, train, test for MAE table, RMSE table, and R2 coefficient.

In [16]:
import random
random.seed(21)

In [17]:
# аналитическое решение

class LinearRegressionAN:
    def __init__(self):
        self.weights = None

    def fit(self, X, y):
        X = np.insert(X, 0, 1, axis=1)
        X_reversed = np.linalg.inv(X.T.dot(X))
        self.weights = X_reversed.dot(X.T.dot(y))
        
    def predict(self, X):
        X = np.insert(X, 0, 1, axis=1)
        return X @ self.weights

In [18]:
# GD

class LinearRegressionGD:
    def __init__(self, lr=10e-3, max_iter=1000):
        self.weights = None
        self.bias = None
        self.lr = lr
        self.max_iter = max_iter
        
    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.bias, self.weights = 0, np.zeros(n_features)

        for _ in range(self.max_iter):
            y_pred = X @ self.weights + self.bias

            err = y_pred - y
            grad_b = 2 / n_samples * np.sum(err)
            grad_w = 2 / n_samples * X.T @ err
            
            self.bias -= self.lr * grad_b
            self.weights -=  self.lr * grad_w
            
    def predict(self, X):
        return X @ self.weights + self.bias

In [19]:
# SGD
# чтобы сгд выдавал всегда один и тот же результат фиксируем random seed

class LinearRegressionSGD:
    def __init__(self, lr=10e-3, max_iter=1000, random_state=21):
        self.weights = None
        self.bias = None
        self.lr = lr
        self.max_iter = max_iter
        self.random_state = random_state
        
    def fit(self, X, y):
        np.random.seed(self.random_state)
        
        n_samples, n_features = X.shape
        self.bias, self.weights = 0, np.zeros(n_features)

        for _ in range(self.max_iter):
            i = np.random.randint(n_samples)
            y_pred = X.iloc[i] @ self.weights + self.bias

            err = y_pred - y.iloc[i]
            self.bias -= 2 * self.lr * err
            self.weights -= 2 * self.lr * err * X.iloc[i]

    def predict(self, X):
        return X @ self.weights + self.bias



In [20]:
def mae_my(y_pred, y):
    return 1 / len(y) * sum(np.abs(y - y_pred))

def rmse_my(y_pred, y):
    mse = 1 / len(y) * sum(np.square(y - y_pred))
    return np.sqrt(mse)

def r2_my(y_pred, y):
    return 1 - sum(np.square(y - y_pred))/sum(np.square(y - y.mean()))

In [21]:
X = df
y = df_encoded['price']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=21, test_size=0.2)

In [None]:
linreg_an = LinearRegressionAN()
linreg_an.fit(X_train, y_train)  
y_pred_an = linreg_an.predict(X_test)

mae_my(y_pred_an, y_test), rmse_my(y_pred_an, y_test), r2_my(y_pred_an, y_test)

(1049.5117652386841, np.float64(2193.9422611603177), 0.32585381050787654)

In [23]:
linreg_gd = LinearRegressionGD()
linreg_gd.fit(X_train, y_train)
y_pred_gd = linreg_gd.predict(X_test)

mae_my(y_pred_gd, y_test), rmse_my(y_pred_gd, y_test), r2_my(y_pred_gd, y_test)

(1016.7704625577406, np.float64(2178.233989368011), 0.3354728069519719)

In [24]:
linreg_sgd = LinearRegressionSGD(lr=0.01, max_iter=10000)
linreg_sgd.fit(X_train, y_train)
y_pred_sgd = linreg_sgd.predict(X_test)

mae_my(y_pred_sgd, y_test), rmse_my(y_pred_sgd, y_test), r2_my(y_pred_sgd, y_test)

(1008.4694867236212, np.float64(2173.69782892445), 0.33823767357660206)

In [None]:
from sklearn.linear_model import LinearRegression

linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test) 

mae_my(y_pred, y_test), rmse_my(y_pred, y_test), r2_my(y_pred, y_test)

(1049.5117652386868, np.float64(2193.94226116032), 0.3258538105078752)

In [26]:
res_mae = pd.DataFrame(columns=['model', 'train', 'test'])
res_rmse = pd.DataFrame(columns=['model', 'train', 'test'])
res_r2 = pd.DataFrame(columns=['model', 'train', 'test'])

def append_results(model, model_name, res_mae=res_mae, res_rmse=res_rmse, res_r2=res_r2, X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test):
    y_pred_train = model.predict(X_train)
    y_pred_test = model.predict(X_test)

    mae_train = mae_my(y_pred_train, y_train)
    mae_test = mae_my(y_pred_test, y_test)
    rmse_train = rmse_my(y_pred_train, y_train)
    rmse_test = rmse_my(y_pred_test, y_test)
    r2_train = r2_my(y_pred_train, y_train)
    r2_test = r2_my(y_pred_test, y_test)

    mae_row = pd.DataFrame({'model' : [model_name], 'train' : [mae_train], 'test' : [mae_test]})
    rmse_row = pd.DataFrame({'model' : [model_name], 'train' : [rmse_train], 'test' : [rmse_test]})
    r2_row = pd.DataFrame({'model' : [model_name], 'train' : [r2_train], 'test' : [r2_test]})

    res_mae = pd.concat([res_mae, mae_row])
    res_rmse = pd.concat([res_rmse, rmse_row], ignore_index=True)
    res_r2 = pd.concat([res_r2, r2_row], ignore_index=True)

    return res_mae, res_rmse, res_r2
    

In [27]:
res_mae, res_rmse, res_r2 = append_results(linreg_an, "Analytical", res_mae, res_rmse, res_r2, X_train, X_test, y_train, y_test)
res_mae, res_rmse, res_r2 = append_results(linreg_gd, "GD", res_mae, res_rmse, res_r2, X_train, X_test, y_train, y_test)
res_mae, res_rmse, res_r2 = append_results(linreg_sgd, "SGD", res_mae, res_rmse, res_r2, X_train, X_test, y_train, y_test)
res_mae, res_rmse, res_r2 = append_results(linreg, "Skleran original", res_mae, res_rmse, res_r2, X_train, X_test, y_train, y_test)

  res_mae = pd.concat([res_mae, mae_row])
  res_rmse = pd.concat([res_rmse, rmse_row], ignore_index=True)
  res_r2 = pd.concat([res_r2, r2_row], ignore_index=True)


## 5.Regularized models implementation — Ridge, Lasso, ElasticNet

- Implement Ridge, Lasso, ElasticNet algorithms: extend the loss function with L2, L1 and both regularizations accordingly.
- Make predictions with your algorithm and estimate the model with MAE, RMSE and R2 metrics.
- Initialize Ridge(), Lasso(), and ElasticNet() from sklearn.linear_model, fit the model, and make predictions for the training and test - samples as in the previous lesson.
- Compare quality metrics and make sure the difference is small (between your implementations and sklearn).
- Store the metrics as in the previous lesson in a table 

In [28]:
class LinearRegressionSGD_Lasso:
    def __init__(self, lr=10e-3, max_iter=1000, random_state=21, alpha=0.1):
        self.weights = None
        self.bias = None
        self.lr = lr
        self.max_iter = max_iter
        self.random_state = random_state
        self.alpha = alpha
        
    def fit(self, X, y):
        np.random.seed(self.random_state)
        
        n_samples, n_features = X.shape
        self.bias, self.weights = 0, np.zeros(n_features)

        for _ in range(self.max_iter):
            i = np.random.randint(n_samples)
            y_pred = X.iloc[i] @ self.weights + self.bias

            err = y_pred - y.iloc[i]
            self.bias -= 2 * self.lr * err
            self.weights -= self.lr * (2 * err * X.iloc[i] + self.alpha * np.sign(self.weights))

    def predict(self, X):
        return X @ self.weights + self.bias

In [29]:
class LinearRegressionSGD_Ridge:
    def __init__(self, lr=10e-3, max_iter=1000, random_state=21, alpha=0.1):
        self.weights = None
        self.bias = None
        self.lr = lr
        self.max_iter = max_iter
        self.random_state = random_state
        self.alpha = alpha
        
    def fit(self, X, y):
        np.random.seed(self.random_state)
        
        n_samples, n_features = X.shape
        self.bias, self.weights = 0, np.zeros(n_features)

        for _ in range(self.max_iter):
            i = np.random.randint(n_samples)
            y_pred = X.iloc[i] @ self.weights + self.bias

            err = y_pred - y.iloc[i]
            self.bias -= 2 * self.lr * err
            self.weights -= 2 * self.lr * (err * X.iloc[i] + self.alpha * self.weights)

    def predict(self, X):
        return X @ self.weights + self.bias

In [30]:
class LinearRegressionSGD_ElasticNet:
    def __init__(self, lr=10e-3, max_iter=1000, random_state=21, alpha=0.1):
        self.weights = None
        self.bias = None
        self.lr = lr
        self.max_iter = max_iter
        self.random_state = random_state
        self.alpha = alpha
        
    def fit(self, X, y):
        np.random.seed(self.random_state)
        
        n_samples, n_features = X.shape
        self.bias, self.weights = 0, np.zeros(n_features)

        for _ in range(self.max_iter):
            i = np.random.randint(n_samples)
            y_pred = X.iloc[i] @ self.weights + self.bias

            err = y_pred - y.iloc[i]
            self.bias -= 2 * self.lr * err
            self.weights -= self.lr * (2 * err * X.iloc[i] + 2 * self.alpha * self.weights + self.alpha * np.sign(self.weights))

    def predict(self, X):
        return X @ self.weights + self.bias

In [31]:
lasso_my = LinearRegressionSGD_Lasso(lr=0.01, max_iter=10000)
lasso_my.fit(X_train, y_train)
y_pred_lasso_my = lasso_my.predict(X_test)

mae_my(y_pred_lasso_my, y_test), rmse_my(y_pred_lasso_my, y_test), r2_my(y_pred_lasso_my, y_test)

(1008.3231222864138, np.float64(2173.613921394713), 0.33828876234307226)

In [32]:
ridge_my = LinearRegressionSGD_Ridge(lr=0.01, max_iter=10000)
ridge_my.fit(X_train, y_train)
y_pred_ridge_my = ridge_my.predict(X_test)

mae_my(y_pred_ridge_my, y_test), rmse_my(y_pred_ridge_my, y_test), r2_my(y_pred_ridge_my, y_test)

(896.7569479244339, np.float64(2163.521577593467), 0.3444193022423313)

In [33]:
elasticnet_my = LinearRegressionSGD_ElasticNet(lr=0.01, max_iter=10000)
elasticnet_my.fit(X_train, y_train)
y_pred_elasticnet_my = elasticnet_my.predict(X_test)

mae_my(y_pred_elasticnet_my, y_test), rmse_my(y_pred_elasticnet_my, y_test), r2_my(y_pred_elasticnet_my, y_test)

(896.7570465518085, np.float64(2163.5397707071947), 0.34440827660296347)

In [34]:
from sklearn.linear_model import Lasso, Ridge, ElasticNet
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
y_pred_lasso = lasso.predict(X_test)

mae_my(y_pred_lasso, y_test), rmse_my(y_pred_lasso, y_test), r2_my(y_pred_lasso, y_test)

(1049.0602626444872, np.float64(2193.7202212064512), 0.32599025877895327)

In [35]:
ridge = Ridge(alpha=0.1)
ridge.fit(X_train, y_train)
y_pred_ridge = ridge.predict(X_test)

mae_my(y_pred_ridge, y_test), rmse_my(y_pred_ridge, y_test), r2_my(y_pred_ridge, y_test)

(1049.5035390907296, np.float64(2193.9381147346944), 0.32585635870109175)

In [36]:
elasticnet = ElasticNet(alpha=0.1)
elasticnet.fit(X_train, y_train)
y_pred_elasticnet = elasticnet.predict(X_test)

mae_my(y_pred_elasticnet, y_test), rmse_my(y_pred_elasticnet, y_test), r2_my(y_pred_elasticnet, y_test)

(966.8898939901201, np.float64(2165.5731695818854), 0.3431753841580778)

In [37]:
res_mae, res_rmse, res_r2 = append_results(lasso_my, "Lasso my", res_mae, res_rmse, res_r2)
res_mae, res_rmse, res_r2 = append_results(lasso, "Lasso original", res_mae, res_rmse, res_r2)
res_mae, res_rmse, res_r2 = append_results(ridge_my, "Ridge my", res_mae, res_rmse, res_r2)
res_mae, res_rmse, res_r2 = append_results(ridge, "Ridge original", res_mae, res_rmse, res_r2)
res_mae, res_rmse, res_r2 = append_results(elasticnet_my, "ElasticNet my", res_mae, res_rmse, res_r2)
res_mae, res_rmse, res_r2 = append_results(elasticnet, "ElasticNet original", res_mae, res_rmse, res_r2)

это все с регуляризацией, там метрики +- такие же

## 6. Feature normalization

1. First, write several examples of why and where feature normalization is mandatory and vice versa.
2. Let's consider the first of the classical normalization methods — MinMaxScaler. Write a mathematical formula for this method.
3. Implement your own function or class for MinMaxScaler feature normalization.
4. Initialize MinMaxScaler() from sklearn.preprocessing.
5. Compare the feature normalization with your own method and with sklearn.
6. Repeat the steps from b to e for another normalization method StandardScaler.

нормализация нужна для того, чтобы все признаки по возможности своего влияния на результат перед входом в модель были равны.  
Особенно важно для алгоритмов, чувствительных к расстоянию - например, KNN  
По сути мы ограничиваем данные в заданном диапазоне  
Для линейных моделей нормализация способствует повышению производительности модели - тк числа меньше, то вычисления быстрее  

все формулы в функциях

In [39]:
def MinMaxScaler_my(X):
    X = pd.DataFrame(X)
    X_scaled = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
    return X_scaled.reset_index(drop=True)

def StandardSclaler_my(X):
    X = pd.DataFrame(X)
    X_scaled = (X - X.mean(axis=0))/(X.std(axis=0))
    return X_scaled.reset_index(drop=True)


In [40]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler

X_train_minmax_my = MinMaxScaler_my(X_train)
X_train_minmax = pd.DataFrame(MinMaxScaler().fit_transform(X_train), columns=X_train.columns)


X_train_standard_my = StandardSclaler_my(X_train)
X_train_standard= pd.DataFrame(StandardScaler().fit_transform(X_train), columns=X_train.columns)

In [41]:
diff_minmax = np.abs(X_train_minmax - X_train_minmax_my)
diff_standard = np.abs(X_train_standard - X_train_standard_my)

print(f"MinMax максимальное отклонение: {np.max(diff_minmax)}, cреднее: {np.mean(diff_minmax)}")
print(f"StandardScaler максимальное отклонение: {np.max(diff_standard)}, cреднее: {np.mean(diff_standard)}")

MinMax максимальное отклонение: 1.1102230246251565e-16, cреднее: 5.950029104483058e-20
StandardScaler максимальное отклонение: 0.00022161598935710458, cреднее: 9.518907874769516e-06


## 7. Fit custom and sklearn models with normalized data

- Fit all models — Linear Regression, Ridge, Lasso, and ElasticNet — with MinMaxScaler.
- Fit all models — Linear Regression, Ridge, Lasso, and ElasticNet — with StandardScaler.
- Add all results to our dataframe with metrics on samples.

In [42]:
X_test_minmax_my = MinMaxScaler_my(X_test)
X_test_standard_my = StandardSclaler_my(X_test)

In [43]:
linreg_an_minmax = LinearRegressionAN()
linreg_an_minmax.fit(X_train_minmax_my, y_train)
linreg_an_standard = LinearRegressionAN()
linreg_an_standard.fit(X_train_standard_my, y_train)

lasso_my_minmax = LinearRegressionSGD_Lasso(lr=0.01, max_iter=10000)
lasso_my_minmax.fit(X_train_minmax_my, y_train)
lasso_my_standard = LinearRegressionSGD_Lasso(lr=0.01, max_iter=10000)
lasso_my_standard.fit(X_train_standard_my, y_train)

ridge_my_minmax = LinearRegressionSGD_Ridge(lr=0.01, max_iter=10000)
ridge_my_minmax.fit(X_train_minmax_my, y_train)
ridge_my_standard = LinearRegressionSGD_Ridge(lr=0.01, max_iter=10000)
ridge_my_standard.fit(X_train_standard_my, y_train)

elasticnet_my_minmax = LinearRegressionSGD_ElasticNet(lr=0.01, max_iter=10000)
elasticnet_my_minmax.fit(X_train_minmax_my, y_train)
elasticnet_my_standard = LinearRegressionSGD_ElasticNet(lr=0.01, max_iter=10000)
elasticnet_my_standard.fit(X_train_standard_my, y_train)


In [44]:
res_mae, res_rmse, res_r2 = append_results(linreg_an_minmax, "Linreg MinMax", res_mae, res_rmse, res_r2, X_train_minmax_my, X_test_minmax_my, y_train.reset_index(drop=True), y_test.reset_index(drop=True))
res_mae, res_rmse, res_r2 = append_results(linreg_an_standard, "Linreg Standard", res_mae, res_rmse, res_r2, X_train_standard_my, X_test_standard_my, y_train.reset_index(drop=True), y_test.reset_index(drop=True))
res_mae, res_rmse, res_r2 = append_results(lasso_my_minmax, "Lasso MinMax", res_mae, res_rmse, res_r2, X_train_minmax_my, X_test_minmax_my, y_train.reset_index(drop=True), y_test.reset_index(drop=True))
res_mae, res_rmse, res_r2 = append_results(lasso_my_standard, "Lasso Standard", res_mae, res_rmse, res_r2, X_train_standard_my, X_test_standard_my, y_train.reset_index(drop=True), y_test.reset_index(drop=True))
res_mae, res_rmse, res_r2 = append_results(ridge_my_minmax, "Ridge MinMax", res_mae, res_rmse, res_r2, X_train_minmax_my, X_test_minmax_my, y_train.reset_index(drop=True), y_test.reset_index(drop=True))
res_mae, res_rmse, res_r2 = append_results(ridge_my_standard, "Ridge Standard", res_mae, res_rmse, res_r2, X_train_standard_my, X_test_standard_my, y_train.reset_index(drop=True), y_test.reset_index(drop=True))
res_mae, res_rmse, res_r2 = append_results(elasticnet_my_minmax, "ElasticNet MinMax", res_mae, res_rmse, res_r2, X_train_minmax_my, X_test_minmax_my, y_train.reset_index(drop=True), y_test.reset_index(drop=True))
res_mae, res_rmse, res_r2 = append_results(elasticnet_my_standard, "ElasticNet Standardl", res_mae, res_rmse, res_r2, X_train_standard_my, X_test_standard_my, y_train.reset_index(drop=True), y_test.reset_index(drop=True))

## 8. Overfit models

- Let's look at an overfitted model in practice. From theory, you know that polynomial regression is easy to overfit. So let's create a toy example and see how regularization works in real life.
- In the previous lesson, we created polynomial features with degree 10. Here we repeat these steps from the previous lesson, remembering that we have only 3 basic features — 'bathrooms', 'bedrooms', 'interest_level'.
- And train and fit all our implemented algorithms — Linear Regression, Ridge, Lasso, and ElasticNet — on a set of polynomial features.
- Store the results of the quality metrics in the result dataframe.
- Analyze the results and select the best model according to your opinion.
- Additionally try different alpha parameters of regularization in algorithms, choose the best one and analyze results.

In [45]:
# надо изменить рабочие датасеты

X_overfit = train[['bathrooms', 'bedrooms', 'interest_level']].reset_index(drop=True)
X_overfit = pd.get_dummies(X_overfit, columns=['interest_level'], dtype=int)

In [46]:
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(10)
X_overfit = pd.DataFrame(poly.fit_transform(X_overfit))

In [47]:
X_overfit_train, X_overfit_test, y_overfit_train, y_overfit_test = train_test_split(X_overfit, y, random_state=21)

In [48]:
linreg_overfit = LinearRegressionAN()
linreg_overfit.fit(X_overfit_train, y_overfit_train)
y_pred_linreg_overfit = linreg_overfit.predict(X_overfit_test)

mae_my(y_pred_linreg_overfit, y_overfit_test), rmse_my(y_pred_linreg_overfit, y_overfit_test), r2_my(y_pred_linreg_overfit, y_overfit_test)

LinAlgError: Singular matrix

In [None]:
lasso_overfit = LinearRegressionSGD_Lasso(lr=0.1, max_iter=10000)
lasso_overfit.fit(X_overfit_train, y_overfit_train)
y_pred_lasso_overfit = lasso_overfit.predict(X_overfit_test)

mae_my(y_pred_lasso_overfit, y_overfit_test), rmse_my(y_pred_lasso_overfit, y_overfit_test), r2_my(y_pred_lasso_overfit, y_overfit_test)

(nan, np.float64(nan), nan)

In [None]:
ridge_overfit = LinearRegressionSGD_Ridge(lr=0.1, max_iter=10000)
ridge_overfit.fit(X_overfit_train, y_overfit_train)
y_pred_ridge_overfit = ridge_overfit.predict(X_overfit_test)

mae_my(y_pred_ridge_overfit, y_overfit_test), rmse_my(y_pred_ridge_overfit, y_overfit_test), r2_my(y_pred_ridge_overfit, y_overfit_test)

(nan, np.float64(nan), nan)

In [None]:
elasticnet_overfit = LinearRegressionSGD_ElasticNet(lr=0.1, max_iter=10000)
elasticnet_overfit.fit(X_overfit_train, y_overfit_train)
y_pred_elasticnet_overfit = elasticnet_overfit.predict(X_overfit_test)

mae_my(y_pred_elasticnet_overfit, y_overfit_test), rmse_my(y_pred_elasticnet_overfit, y_overfit_test), r2_my(y_pred_elasticnet_overfit, y_overfit_test)

(nan, np.float64(nan), nan)

в общем, все очень-очень печально :)  
результаты в таблицу вносить не буду, просто запомним, что линейная зависимость признаков - это плохо  
я пробовала оригинальные lasso и т.д. они реализованы через coordinate descent, а не sgd и таких вычислений мой ноутбук не выдерживает  
но даже при градиентном решении лин. зав. - плохо, т.к. эллипсы ошибок получаются более вытянутыми, то скорость сходимости градиентного спуска ниже.

# 9. Naive models

- Calculate the mean and median metrics from the previous lesson and add the results to the final dataframe.

In [49]:
train_mean = y_train.mean()
test_mean = y_test.mean()

train_median = y_train.median()
test_median = y_test.median()

In [50]:
mae_train = mae_my(train_mean, y_train)
mae_test = mae_my(test_mean, y_test)
rmse_train = rmse_my(train_mean, y_train)
rmse_test = rmse_my(test_mean, y_test)
r2_train = r2_my(train_mean, y_train)
r2_test = r2_my(test_mean, y_test)

mae_row = pd.DataFrame({'model' : ['mean'], 'train' : [mae_train], 'test' : [mae_test]})
rmse_row = pd.DataFrame({'model' : ['mean'], 'train' : [rmse_train], 'test' : [rmse_test]})
r2_row = pd.DataFrame({'model' : ['mean'], 'train' : [r2_train], 'test' : [r2_test]})

res_mae = pd.concat([res_mae, mae_row])
res_rmse = pd.concat([res_rmse, rmse_row], ignore_index=True)
res_r2 = pd.concat([res_r2, r2_row], ignore_index=True)



mae_train = mae_my(train_median, y_train)
mae_test = mae_my(test_median, y_test)
rmse_train = rmse_my(train_median, y_train)
rmse_test = rmse_my(test_median, y_test)
r2_train = r2_my(train_median, y_train)
r2_test = r2_my(test_median, y_test)

mae_row = pd.DataFrame({'model' : ['median'], 'train' : [mae_train], 'test' : [mae_test]})
rmse_row = pd.DataFrame({'model' : ['median'], 'train' : [rmse_train], 'test' : [rmse_test]})
r2_row = pd.DataFrame({'model' : ['median'], 'train' : [r2_train], 'test' : [r2_test]})

res_mae = pd.concat([res_mae, mae_row])
res_rmse = pd.concat([res_rmse, rmse_row], ignore_index=True)
res_r2 = pd.concat([res_r2, r2_row], ignore_index=True)

# 10. Compare results

- Print your final tables
- What is the best model?
- Which is the most stable model?

In [51]:
res_mae

Unnamed: 0,model,train,test
0,Analytical,1247.439141,1049.511765
0,GD,1218.760602,1016.770463
0,SGD,1212.55339,1008.469487
0,Skleran original,1247.439141,1049.511765
0,Lasso my,1212.413633,1008.323122
0,Lasso original,1247.034238,1049.060263
0,Ridge my,1104.233309,896.756948
0,Ridge original,1247.4317,1049.503539
0,ElasticNet my,1104.235982,896.757047
0,ElasticNet original,1173.153651,966.889894


In [52]:
res_rmse

Unnamed: 0,model,train,test
0,Analytical,24567.312851,2193.942261
1,GD,24567.802283,2178.233989
2,SGD,24579.023175,2173.697829
3,Skleran original,24567.312851,2193.942261
4,Lasso my,24579.020471,2173.613921
5,Lasso original,24567.312936,2193.720221
6,Ridge my,24581.351393,2163.521578
7,Ridge original,24567.312851,2193.938115
8,ElasticNet my,24581.355007,2163.539771
9,ElasticNet original,24569.338881,2165.57317


In [54]:
res_r2

Unnamed: 0,model,train,test
0,Analytical,0.005495,0.325854
1,GD,0.005456,0.335473
2,SGD,0.004547,0.338238
3,Skleran original,0.005495,0.325854
4,Lasso my,0.004547,0.338289
5,Lasso original,0.005495,0.32599
6,Ridge my,0.004358,0.344419
7,Ridge original,0.005495,0.325856
8,ElasticNet my,0.004358,0.344408
9,ElasticNet original,0.005331,0.343175


лучшие модели - ridge и lasso

# 11. Addition task

- There are some tricks with the target variable for better model quality. If we have a distribution with a heavy tail, you can use a monotone function to "improve" the distribution. In practice, you can use logarithmic functions. We recommend that you do this exercise and compare the results. But don't forget to do the inverse transformation if you want to compare metrics.
- The next trick is outliers. The angle of the linear regression line depends strongly on outliers. And often you should remove these points from !allert! only training data. You should explain why they were removed from the training sample only. We recommend that you do this exercise and compare the results.
- It will also be a useful exercise to implement a linear regression algorithm with batch and mini-batch training or analytical solution (as mentioned in 4.1).

In [55]:
y_train_log = np.log1p(y_train)
y_test_log = np.log1p(y_test)

In [56]:
# используем ridge как одну из лучших моделей для тестирования трюков

ridge = LinearRegressionSGD_Ridge(lr=0.01, max_iter=10000)
ridge.fit(X_train, y_train_log)
y_pred = np.expm1(ridge.predict(X_test)) 

mae_my(y_pred, y_test), rmse_my(y_pred, y_test), r2_my(y_pred, y_test)

(891.4170558913837, np.float64(2077.9029727644142), 0.39528014929718625)

In [57]:
q_low = y_train.quantile(0.01)
q_high = y_train.quantile(0.99)

mask = (y_train > q_low) & (y_train < q_high)

X_train_clean, y_train_clean = X_train[mask.values], y_train[mask.values]

In [58]:
ridge = LinearRegressionSGD_Ridge(lr=0.001, max_iter=10000, alpha=0.001)
ridge.fit(X_train_clean, y_train_clean)
y_pred = ridge.predict(X_test)

mae_my(y_pred, y_test), rmse_my(y_pred, y_test), r2_my(y_pred, y_test)

(818.6127042536398, np.float64(2155.1699909430804), 0.3494708548075838)

если удалить выбросы из теста, то мы не сможем получить полную картину инференса модели

In [59]:
class LinearRegressionMiniBatchGD:
    def __init__(self, lr=10e-3, max_iter=1000, random_state=21, batch_size=32):
        self.weights = None
        self.bias = None
        self.lr = lr
        self.max_iter = max_iter
        self.random_state = random_state
        self.batch_size = batch_size
        
    def fit(self, X, y):
        np.random.seed(self.random_state)
        
        n_samples, n_features = X.shape
        self.bias, self.weights = 0, np.zeros(n_features)

        for _ in range(self.max_iter):
            idx = np.random.choice(n_samples, self.batch_size, replace=False)
            X_batch, y_batch = X.iloc[idx], y.iloc[idx]

            y_pred = X_batch @ self.weights + self.bias

            err = y_pred - y_batch
            grad_b = 2 / self.batch_size * np.sum(err)
            grad_w = 2 / self.batch_size * X_batch.T @ err
            
            self.bias -= self.lr * grad_b
            self.weights -=  self.lr * grad_w

    def predict(self, X):
        return X @ self.weights + self.bias


In [60]:
linreg_batch = LinearRegressionMiniBatchGD(lr=0.005, max_iter=1000, batch_size=32)
linreg_batch.fit(X_train, y_train)
y_pred = linreg_batch.predict(X_test)

mae_my(y_pred, y_test), rmse_my(y_pred, y_test), r2_my(y_pred, y_test)

(891.0189444646536, np.float64(2125.385698234495), 0.36732714104647524)