# REGRESSION MODELS CHEAT SHEET
This notebook contains:
- 10 most common regression models
- When to use them
- Their syntaxes (for importing, fitting, testing)

## For the sake of this cheat sheet
We'll load some data into the notebook to see the models in action



In [None]:
import pandas as pd
df_train = pd.read_csv('/content/sample_data/california_housing_train.csv')
df_test = pd.read_csv('/content/sample_data/california_housing_test.csv')

df_train.columns

Index(['longitude', 'latitude', 'housing_median_age', 'total_rooms',
       'total_bedrooms', 'population', 'households', 'median_income',
       'median_house_value'],
      dtype='object')

In [None]:
X_train = df_train[['housing_median_age', 'total_rooms','total_bedrooms', 'population', 'households', 'median_income']]
y_train = df_train['median_house_value']

X_test = df_test[['housing_median_age', 'total_rooms','total_bedrooms', 'population', 'households', 'median_income']]
y_test = df_test['median_house_value']

## **1. Linear Regression**

To be used:
- When the relationship between the features and the target variable is linear.
- When you need a simple and interpretable model for regression tasks.


In [None]:
# Importing
from sklearn.linear_model import LinearRegression

# Fitting
model = LinearRegression()
model.fit(X_train, y_train)

# Testing
y_pred = model.predict(X_test)

# Evaluating
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(mse, r2)

5808966246.7101965 0.5458835346746116


##**2. Ridge Regression (L2 Regularization)**

To be used:
- When dealing with multicollinearity in the data or preventing overfitting
- When you have a large number of features.

In [None]:
# Importing
from sklearn.linear_model import Ridge

# Fitting
model = Ridge(alpha=1.0)  # alpha is the regularization strength
model.fit(X_train, y_train)

# Testing
y_pred = model.predict(X_test)

# Evaluating
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(mse, r2)

5808960076.618068 0.5458840170221257


## **3. Lasso Regression (L1 Regularization)**
To be used:
- When you want a sparse model with some feature coefficients exactly equal to zero.
- Useful for feature selection and when dealing with high-dimensional data.

In [None]:
# Importing
from sklearn.linear_model import Lasso

# Fitting
model = Lasso(alpha=1.0)  # alpha is the regularization strength
model.fit(X_train, y_train)

# Testing
y_pred = model.predict(X_test)

# Evaluating
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(mse, r2)

5808964096.664563 0.5458837027546248


##**4. ElasticNet Regression**
To be used:
- When you want a combination of L1 and L2 regularization (both Ridge and Lasso benefits).
- Useful for dealing with datasets with multicollinearity and a large number of features.

In [None]:
# Importing
from sklearn.linear_model import ElasticNet

# Fitting
model = ElasticNet(alpha=1.0, l1_ratio=0.5)  # alpha is the regularization strength, l1_ratio controls the balance between L1 and L2 regularization
model.fit(X_train, y_train)

# Testing
y_pred = model.predict(X_test)

# Evaluating
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(mse, r2)


5916733328.35446 0.5374588332533492


#**5. Decision Tree Regression**
To be used:
- When the relationship between features and the target variable is non-linear and can be represented as a hierarchical decision tree.

In [None]:
# Importing
from sklearn.tree import DecisionTreeRegressor

# Fitting
model = DecisionTreeRegressor()
model.fit(X_train, y_train)

# Testing
y_pred = model.predict(X_test)

# Evaluating
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(mse, r2)


9280312694.58 0.27451070999702964


##**6. Random Forest Regression**
To be used:
- When you need better predictive accuracy and reduced overfitting compared to a single Decision Tree.

In [None]:
# Importing
from sklearn.ensemble import RandomForestRegressor

# Fitting
model = RandomForestRegressor()
model.fit(X_train, y_train)

# Testing
y_pred = model.predict(X_test)

# Evaluating
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(mse, r2)


4602571636.797053 0.6401935431639223


##**7. Gradient Boosting Regression**
To be used:
- When high predictive accuracy is required, and complex relationships exist in the data.

In [None]:
# Importing
from sklearn.ensemble import GradientBoostingRegressor

# Fitting
model = GradientBoostingRegressor()
model.fit(X_train, y_train)

# Testing
y_pred = model.predict(X_test)

# Evaluating
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(mse, r2)


4735855249.383914 0.6297740846125799


##**8. Support Vector Regression (SVR)**
To be used:
- When you have non-linear data and need a powerful model capable of handling high-dimensional data.


In [None]:
# Importing
from sklearn.svm import SVR

# Fitting
model = SVR(kernel='rbf')  # rbf is a popular kernel for non-linear data
model.fit(X_train, y_train)

# Testing
y_pred = model.predict(X_test)

# Evaluating
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(mse, r2)

13427202411.670599 -0.04967277126969005


##**9. K-Nearest Neighbors (KNN) Regression**
To be used:
- When local patterns are important and you want a simple non-linear regression approach.

In [None]:
# Importing
from sklearn.neighbors import KNeighborsRegressor

# Fitting
model = KNeighborsRegressor(n_neighbors=5)  # n_neighbors is the number of neighbors to consider
model.fit(X_train, y_train)

# Testing
y_pred = model.predict(X_test)

# Evaluating
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(mse, r2)

9544759257.744932 0.25383757583999844


# MORE CASE STUDY
### Scenarios in which each regression model is most suitable:

1. **Linear Regression**:
   - Scenario: When the relationship between the input features and the target variable is approximately linear.
   - Use Case: Predicting housing prices based on features like area, number of rooms, etc., where the relationship is expected to be linear.

2. **Ridge Regression (L2 Regularization)**:
   - Scenario: When dealing with multicollinearity (high correlation between predictor variables) in the data, or to prevent overfitting when there are many features.
   - Use Case: Predicting stock prices using multiple technical indicators that might be highly correlated.

3. **Lasso Regression (L1 Regularization)**:
   - Scenario: When you have a high-dimensional dataset with many features, and you want to perform feature selection to identify the most relevant ones.
   - Use Case: Analyzing gene expression data to identify the most important genes in predicting a disease outcome.

4. **ElasticNet Regression**:
   - Scenario: When you want to balance the benefits of both L1 and L2 regularization and handle datasets with multicollinearity and a large number of features.
   - Use Case: Predicting customer churn in a telecommunications company where there are both highly correlated features and many potential predictor variables.

5. **Decision Tree Regression**:
   - Scenario: When the relationship between features and the target variable is non-linear and can be represented as a hierarchy of decisions.
   - Use Case: Predicting the price of a used car based on attributes such as age, mileage, and make, where the relationships might be non-linear.

6. **Random Forest Regression**:
   - Scenario: When you need better predictive accuracy and want to reduce overfitting compared to a single Decision Tree.
   - Use Case: Predicting the demand for a product based on multiple factors, where there could be complex interactions between the predictors.

7. **Gradient Boosting Regression**:
   - Scenario: When high predictive accuracy is crucial, and there are complex relationships in the data.
   - Use Case: Predicting the risk of credit default based on historical financial data and other relevant factors.

8. **Support Vector Regression (SVR)**:
   - Scenario: When dealing with non-linear data and you need a powerful model capable of handling high-dimensional data.
   - Use Case: Predicting the price of a house based on features like location, square footage, and amenities, where there might not be a linear relationship.

9. **K-Nearest Neighbors (KNN) Regression**:
   - Scenario: When local patterns are important, and you want a simple non-linear regression approach.
   - Use Case: Predicting daily temperatures based on historical weather data from neighboring locations.
