## Backward Elimination: Advanced Feature Selection Techniques

Backward elimination is a feature selection technique used in statistical modeling, particularly in linear regression, to identify the most significant predictors for a given model. The goal of backward elimination is to improve model performance by removing features that do not significantly contribute to predicting the target variable.

##### Step 1: Data Loading and Preprocessing

We start by loading the dataset and checking its structure.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing

# Load the dataset
data = pd.read_csv('data.csv')
data.head(5)

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,123456.78,134567.89,378123.45,California,170123.45


We observe that State is the only categorical variable with more than two unique labels. As such, we use One-Hot Encoding for this feature, since Label Encoding is suitable for binary categories.

#### Checking for Missing Values

Before proceeding, it's essential to ensure that the dataset doesn't have any missing values.

In [2]:
# Check for missing values
data.isnull().sum()

R&D Spend          0
Administration     0
Marketing Spend    0
State              0
Profit             0
dtype: int64

In [3]:
# Apply One-Hot Encoding for the 'State' column
data = pd.get_dummies(data, drop_first=True)  # Drop the first dummy to avoid multicollinearity
data.head(5)

Unnamed: 0,R&D Spend,Administration,Marketing Spend,Profit,State_Florida,State_New York
0,165349.2,136897.8,471784.1,192261.83,False,True
1,162597.7,151377.59,443898.53,191792.06,False,False
2,153441.51,101145.55,407934.54,191050.39,True,False
3,144372.41,118671.85,383199.62,182901.99,False,True
4,123456.78,134567.89,378123.45,170123.45,False,False


##### Step 2: Splitting the Dataset

We split the dataset into independent variables X and the dependent variable Y (Profit), and then into training and test sets.



In [4]:
X = data.drop(['Profit'], axis=1)  # Remove the target and categorical columns
Y = data['Profit']

# Splitting the dataset into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)


##### Step 3: Fitting the Initial Linear Regression Model

We train the initial model using Linear Regression and evaluate the performance based on Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).

In [5]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from math import sqrt

# Fit the linear regression model
regressor = LinearRegression()
regressor.fit(X_train, Y_train)

# Make predictions on the test set
Y_pred = regressor.predict(X_test)

# Calculate MSE and RMSE
mse = round((mean_squared_error(Y_test, Y_pred)) / 100, 2)
rmse = round((sqrt(mse)) / 100, 2)

mse, rmse


(349678.11, 5.91)

##### Step 4: Backward Elimination Process

In Backward Elimination, we begin with all features and iteratively remove the least significant feature based on its p-value. The goal is to retain only features that contribute meaningfully to the prediction.
Step 4.1: Adding Constant to Features

To implement backward elimination using OLS regression (Ordinary Least Squares), we need to add a constant column to our feature set X.

##### Step 4.1: Adding Constant to Features

To implement backward elimination using OLS regression (Ordinary Least Squares), we need to add a constant column to our feature set X.

In [6]:
import statsmodels.api as sm

# Add constant column to the features
X = sm.add_constant(X)


##### Step 4.2: Initial Model Fit

We fit the initial model and obtain the summary, which provides p-values for each feature. These p-values will guide our feature elimination.



In [7]:
import statsmodels.api as sm

# Add a constant column for the intercept
X = sm.add_constant(X)

# Ensure all data types are float64
X = X.astype(float)
Y = Y.astype(float)

# Fit the OLS model
model = sm.OLS(Y, X).fit()

# Display the summary
print(model.summary())


                            OLS Regression Results                            
Dep. Variable:                 Profit   R-squared:                       0.657
Model:                            OLS   Adj. R-squared:                  0.571
Method:                 Least Squares   F-statistic:                     7.647
Date:                Thu, 05 Dec 2024   Prob (F-statistic):           0.000369
Time:                        15:34:54   Log-Likelihood:                -253.49
No. Observations:                  26   AIC:                             519.0
Df Residuals:                      20   BIC:                             526.5
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
const            8.889e+04   1.85e+04     

##### Step 4.3: Removing Features with High p-Values

We begin removing the feature with the highest p-value greater than the chosen significance level (commonly 0.05). We iterate this process until all remaining features have p-values below the threshold.

In [8]:
# Remove feature with highest p-value (e.g., 'Administration')
X = X.drop(['Administration'], axis=1)
model = sm.OLS(Y, X).fit()
model.summary()

# Remove feature with highest p-value (e.g., 'Marketing Spend')
X = X.drop(['Marketing Spend'], axis=1)
model = sm.OLS(Y, X).fit()
model.summary()


0,1,2,3
Dep. Variable:,Profit,R-squared:,0.18
Model:,OLS,Adj. R-squared:,0.068
Method:,Least Squares,F-statistic:,1.612
Date:,"Thu, 05 Dec 2024",Prob (F-statistic):,0.215
Time:,15:34:54,Log-Likelihood:,-264.8
No. Observations:,26,AIC:,537.6
Df Residuals:,22,BIC:,542.6
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1.568e+05,1.46e+04,10.725,0.000,1.26e+05,1.87e+05
R&D Spend,0.2335,0.110,2.132,0.044,0.006,0.461
State_Florida,-124.1711,3388.024,-0.037,0.971,-7150.503,6902.161
State_New York,657.7685,3314.027,0.198,0.844,-6215.102,7530.639

0,1,2,3
Omnibus:,0.381,Durbin-Watson:,0.907
Prob(Omnibus):,0.826,Jarque-Bera (JB):,0.024
Skew:,-0.072,Prob(JB):,0.988
Kurtosis:,3.037,Cond. No.,1430000.0


##### Step 5: Model Evaluation After Feature Selection

After eliminating non-significant features, we re-fit the model and evaluate it again using MSE and RMSE.

In [9]:
# Train the regression model again with reduced features
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)
regressor = LinearRegression()
regressor.fit(X_train, Y_train)

# Make predictions on the test set
Y_pred = regressor.predict(X_test)

# Calculate MSE and RMSE
mse = round((mean_squared_error(Y_test, Y_pred)) / 100, 2)
rmse = round((sqrt(mse)) / 100, 2)

mse, rmse


(390856.61, 6.25)

### Advanced Techniques for Feature Selection

While backward elimination is a powerful technique, there are more advanced methods for feature selection. Let's explore some of these techniques:
1. Stepwise Regression

Stepwise regression is an extension of backward elimination, where we consider both adding and removing variables based on their p-values. There are two types:

    Forward Stepwise Selection: Starts with no predictors and adds the most significant predictors step by step.
    Backward Stepwise Selection: Starts with all predictors and removes the least significant ones step by step.

2. Using AIC/BIC for Feature Selection

Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are statistical measures that help determine the best model by penalizing complexity. Lower values of AIC/BIC indicate better models. These can be used to select features by comparing the models' AIC/BIC values as you remove features.

In [10]:
# Fit model and check AIC and BIC values
print("AIC:", model.aic)
print("BIC:", model.bic)


AIC: 537.6054486199483
BIC: 542.6378347720342


3. Recursive Feature Elimination (RFE)

RFE is an advanced feature selection method where a model is recursively trained and the least important feature is eliminated until the optimal number of features is reached. It is available in scikit-learn.

In [11]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression

# Use RFE for feature selection
selector = RFE(estimator=LinearRegression(), n_features_to_select=3)
selector = selector.fit(X_train, Y_train)

# Get selected features
selected_features = X.columns[selector.support_]
selected_features


Index(['R&D Spend', 'State_Florida', 'State_New York'], dtype='object')