# Elastic Net Regression
Elastic Net Regression is a type of linear regression that combines the properties of both Lasso (L1) and Ridge (L2) regularization methods. It aims to overcome the limitations of Lasso, especially when dealing with highly correlated features.

## Advantages:
- Feature Selection and Shrinkage: Combines the benefits of both Lasso and Ridge, performing feature selection and coefficient shrinkage.
- Handles Multicollinearity: Effective in cases where the independent variables are highly collinear.
- Improved Prediction Accuracy: Can result in more accurate predictions by balancing the trade-off between bias and variance.

## Disadvantages:
- Complexity: More complex to tune compared to simple linear regression due to two regularization terms.
- Bias Introduction: Introduces bias into the model, which can reduce interpretability.
- Dependency on Scaling: Highly dependent on feature scaling.

## Use Case:
- High-dimensional Data: Effective for datasets with a large number of features where feature selection is necessary.
- Genomics: Identifying significant genes related to specific conditions.
- Finance: Predicting stock prices where features might be highly correlated.
- Marketing Analysis: Sales prediction models where multiple marketing channels are interdependent.

## Scaling (necessary)
Yes, scaling is necessary for Ridge Regression because it relies on distance metrics.

## Encoding (necessary)
Yes, categorical features need to be encoded into numerical values.

# Import Libraries

In [20]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from scipy.stats import uniform, loguniform

# Read Dataset

In [21]:
df = pd.read_csv('50_StartUp_dataset.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,R&D Spend,Administration,Marketing Spend,Profit,Florida,New York
0,0,165349.2,136897.8,471784.1,192261.83,0.0,1.0
1,1,162597.7,151377.59,443898.53,191792.06,0.0,0.0
2,2,153441.51,101145.55,407934.54,191050.39,1.0,0.0
3,3,144372.41,118671.85,383199.62,182901.99,0.0,1.0
4,4,142107.34,91391.77,366168.42,166187.94,1.0,0.0


# get X , Y

In [22]:
x=df.drop('Profit',axis=1)
y=df['Profit']

## Get train, test and valid data

In [23]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(x,y,test_size=.1, random_state=42)
x_train, x_valid, y_train, y_valid=train_test_split(x_train,y_train,test_size=.1, random_state=42)

In [24]:
print('x_train shape =',x_train.shape)
print('x_test shape =',x_test.shape)
print('x_valid shape =',x_valid.shape)
print('y_train shape =',y_train.shape)
print('y_test shape =',y_test.shape)
print('y_valid shape =',y_valid.shape)

x_train shape = (40, 6)
x_test shape = (5, 6)
x_valid shape = (5, 6)
y_train shape = (40,)
y_test shape = (5,)
y_valid shape = (5,)


# Scaling

In [25]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
x_train=scaler.fit_transform(x_train)
x_valid=scaler.transform(x_valid)
x_test=scaler.transform(x_test)

# Train

## Grid Search

In [26]:
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import GridSearchCV

elastic_net = ElasticNet()

params = {
    'alpha': [0.01, 0.1, 1.0, 10.0, 100.0],
    'l1_ratio': [0.1, 0.5, 0.9],
    'fit_intercept': [True, False],
    'max_iter': [1000, 5000, 10000]
}   

param_grid = {
    'alpha': [1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 100, 1000],
    'l1_ratio': [0.1, 0.3, 0.5, 0.7, 0.9],
    'fit_intercept': [True, False],
    'max_iter': [1000, 5000, 10000],
    'tol': [1e-3, 1e-2, 1e-1]
}


grid_search = GridSearchCV(elastic_net, params, scoring='r2', cv=5, n_jobs=-1)

# Train the grid search
grid_search.fit(x_train, y_train)  

In [27]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

Best Hyperparameter Index: 24
Best Hyperparameters: {'alpha': 0.1, 'fit_intercept': True, 'l1_ratio': 0.9, 'max_iter': 1000}
Best Cross-Validated Score: 0.9511716848010809


In [28]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
y_pred = model.predict(x_test)

## Randomized Search

In [29]:
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import RandomizedSearchCV

elastic_net = ElasticNet()

params = {
    'alpha': np.logspace(-4, 4, 50),
    'l1_ratio': np.linspace(0.01, 1.0, 50),
    'fit_intercept': [True, False],
    'max_iter': np.arange(1000, 10001, 500),
}

param_dist = {
    'alpha': np.logspace(-6, 6, 100),
    'l1_ratio': np.linspace(0.01, 1.0, 50),
    'fit_intercept': [True, False],
    'max_iter': np.arange(1000, 10001, 500),
    'tol': np.logspace(-4, -1, 50)
}


random_search = RandomizedSearchCV(elastic_net, params, scoring='r2', n_iter=10, cv=5, n_jobs=-1, random_state=42)

# Train the random search
random_search.fit(x_train, y_train)

In [30]:
# print("Best Hyperparameter Index:", random_search.best_index_)
# print("Best Hyperparameters:", random_search.best_params_)
# print("Best Cross-Validated Score:", random_search.best_score_)

In [31]:
# model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train ElasticNet without search

In [32]:
from sklearn.linear_model import ElasticNet
model=ElasticNet(alpha=1, l1_ratio=0.5, fit_intercept=True, max_iter=1000)
model.fit(x_train, y_train)

# Check overfiiting

In [33]:
y_train_pred=model.predict(x_train)
r2_score(y_train_pred , y_train)

0.8907683068758714

In [34]:
y_valid_pred=model.predict(x_valid)
r2_score(y_valid_pred , y_valid)

0.6725405655651447

# Evaluate model

In [35]:
y_pred = model.predict(x_test)

## r2_score

In [36]:
from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)
r2

0.9663949497823119

## mean_squared_error

In [37]:
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
mse

22888135.40416471

## mean_absolute_error

In [38]:
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
mae

4443.142645253909