# Ridge Regression:

Ridge regression is a reqularized version of linear Regression: a regularization term equal to ....... is added to the cost function. This forces the learning algorithm to not only fit the data but also keep the model weights as small as possible. Note that the regularization term should only be added to the cost function during training. once the model is trainded you want to evaluate the model's performance using the unregularized performance measure.

In [14]:
from sklearn.linear_model import Ridge
import numpy as np

# Example data
x= np.array([[1,1],[1,2],[2,2],[2,3]])

#Target values
y = np.dot(x, np.array([1,2])) *3

#Ridge Regression Model
ridge_reg = Ridge(alpha=1.0) # alpha is the equivalent of lambda in the formula
ridge_reg.fit(x,y)

# coefficients
print("Coefficient:", ridge_reg.coef_)
# intercept
print("Intercept:", ridge_reg.intercept_)

Coefficient: [2.4 4.2]
Intercept: 4.5


### Comparing Simple Linear Regression vs. Ridge Regression

## Import Libraries and Load the dataset

In [15]:
import seaborn  as sns
import numpy as np
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# load the titanic dataset
df = sns.load_dataset("titanic")

## Pre processing 

In [17]:
df.columns

Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',
       'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town',
       'alive', 'alone'],
      dtype='object')

In [18]:
# selecting a subset of columns for simplicity
columns_to_use = ['survived', 'pclass', 'sex', 'age', 'fare']
df = df[columns_to_use]

# handling missing values
df['age'].fillna(df['age'].mean(), inplace=True)

#define features and target variables
x = df.drop('survived', axis=1)
y = df['survived']

#split the dataset into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=42)

### Creating a pipeline

In [19]:
# Define a pipeline for OnehotEncoding
categorical_features = ['sex']
numerical_features = ['pclass', 'age', 'fare']

# Preprocessor
preprocessor = ColumnTransformer(
    transformers= [
        ('num', StandardScaler(), numerical_features),
        ('cat', OneHotEncoder(), categorical_features)])

# linear Regressin Pipeline 
lr_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                              ('regressor', LinearRegression())])

# Ridge Regression Pipeline
ridge_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                                 ('regressor', Ridge(alpha=1.0))])

## Train and Evaluate the model

In [20]:
# Train and evaluate Linear Regression
lr_pipeline.fit(x_train, y_train)
lr_pred = lr_pipeline.predict(x_test)
lr_mse = mean_squared_error(y_test, lr_pred)

# Train and evaluate Ridge Regression
ridge_pipeline.fit(x_train, y_train)
ridge_pred = ridge_pipeline.predict(x_test)
ridge_mse = mean_squared_error(y_test, ridge_pred)

print("Linear Regression MSE: ", lr_mse)
print("Ridge Regression MSE: ", ridge_mse)

Linear Regression MSE:  0.13721032187197305
Ridge Regression MSE:  0.1372271421724039
