## Forward Elimination: Advanced Feature Selection Techniques

Forward Elimination is a stepwise feature selection technique used in statistical modeling to identify the most significant predictors for a model. Unlike backward elimination, forward elimination starts with no predictors and adds the most significant predictors iteratively until no additional variables improve the model significantly.

##### Step 1: Understanding Forward Elimination
The Process

    Start with an empty model: No independent variables (features) are included initially.
    Iteratively add features:
        Test each feature not in the model and evaluate its significance by fitting a new model.
        Add the feature with the lowest p-value below the significance threshold (e.g., 0.05).
    Stop when no features meet the inclusion criteria:
        When adding any remaining features does not improve the model significantly, the process stops.

##### Step 2: Data Loading and Preprocessing

Let's load the dataset and preprocess it for analysis.

In [1]:
# Import libraries
import numpy as np
import pandas as pd

# Load the dataset
data = pd.read_csv('data.csv')

# Display the first few rows of the dataset
data.head(4)


Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99


Handling Categorical Variables

The dataset contains a categorical feature, State, which we encode using One-Hot Encoding.

In [2]:
# Apply One-Hot Encoding for the 'State' column
data = pd.get_dummies(data, drop_first=True)

# Check the updated dataset
data.head(4)


Unnamed: 0,R&D Spend,Administration,Marketing Spend,Profit,State_Florida,State_New York
0,165349.2,136897.8,471784.1,192261.83,False,True
1,162597.7,151377.59,443898.53,191792.06,False,False
2,153441.51,101145.55,407934.54,191050.39,True,False
3,144372.41,118671.85,383199.62,182901.99,False,True


##### Step 3: Splitting the Data

Separate the dataset into independent (X) and dependent (Y) variables.



In [3]:
# Separate features (X) and target variable (Y)
X = data.drop(['Profit'], axis=1)
Y = data['Profit']


We also split the data into training and test sets for validation.

In [4]:
# Import train_test_split
from sklearn.model_selection import train_test_split

# Split the dataset
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)


##### Step 4: Implementing Forward Elimination

We implement Forward Elimination using the statsmodels library to iteratively add features to the model based on their p-values.
##### Step 4.1: Define the Threshold for Inclusion

The commonly used significance threshold is 0.05.

In [5]:
# Significance level for feature inclusion
SL = 0.05


##### Step 4.2: Initialize Variables

Start with an empty set of features and iteratively add features based on p-values.

In [6]:
import statsmodels.api as sm

def forward_elimination(X, Y, SL):
    initial_features = X.columns.tolist()  # List of all features
    selected_features = []                # List of selected features
    
    for i in range(len(initial_features)):
        p_values = []
        for feature in initial_features:
            # Add the current feature to the model
            temp_features = selected_features + [feature]
            X_temp = X[temp_features]
            X_temp = sm.add_constant(X_temp)  # Add constant for OLS
            model = sm.OLS(Y, X_temp).fit()
            p_values.append((feature, model.pvalues[feature]))
        
        # Select the feature with the smallest p-value
        feature, p_value = min(p_values, key=lambda x: x[1])
        
        if p_value < SL:
            selected_features.append(feature)
            initial_features.remove(feature)
        else:
            break  # Stop if no feature meets the threshold
    
    return selected_features


##### Step 4.3: Apply Forward Elimination

Use the function to select features.

In [7]:
# Ensure all data types are float64
X = X.astype(float)
Y = Y.astype(float)

selected_features = forward_elimination(X, Y, SL)
selected_features


['Marketing Spend']

##### Step 4.4: Train the Final Model

After selecting the significant features, train the model again using only these features.



In [8]:
# Use the selected features
X_selected = X[selected_features]

# Train the model
X_train, X_test, Y_train, Y_test = train_test_split(X_selected, Y, test_size=0.2, random_state=0)
regressor = sm.OLS(Y_train, sm.add_constant(X_train)).fit()

# Evaluate the final model
regressor.summary()


0,1,2,3
Dep. Variable:,Profit,R-squared:,0.6
Model:,OLS,Adj. R-squared:,0.578
Method:,Least Squares,F-statistic:,27.01
Date:,"Thu, 05 Dec 2024",Prob (F-statistic):,6.07e-05
Time:,15:34:50,Log-Likelihood:,-196.24
No. Observations:,20,AIC:,396.5
Df Residuals:,18,BIC:,398.5
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,9.672e+04,1.76e+04,5.490,0.000,5.97e+04,1.34e+05
Marketing Spend,0.2111,0.041,5.197,0.000,0.126,0.296

0,1,2,3
Omnibus:,2.559,Durbin-Watson:,1.724
Prob(Omnibus):,0.278,Jarque-Bera (JB):,1.389
Skew:,-0.327,Prob(JB):,0.499
Kurtosis:,1.887,Cond. No.,7340000.0


##### Step 5: Advanced Tools for Forward Elimination
###### 1. Automated Feature Selection with AIC/BIC

Using Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) can provide a more robust approach to feature selection by balancing model fit and complexity.

In [9]:
def forward_elimination_aic(X, Y):
    initial_features = X.columns.tolist()
    selected_features = []
    current_aic = np.inf  # Start with a very high AIC
    
    for i in range(len(initial_features)):
        aic_values = []
        for feature in initial_features:
            # Add the feature to the model
            temp_features = selected_features + [feature]
            X_temp = sm.add_constant(X[temp_features])
            model = sm.OLS(Y, X_temp).fit()
            aic_values.append((feature, model.aic))
        
        # Select the feature with the lowest AIC
        feature, aic = min(aic_values, key=lambda x: x[1])
        
        if aic < current_aic:
            current_aic = aic
            selected_features.append(feature)
            initial_features.remove(feature)
        else:
            break  # Stop if no improvement in AIC
    
    return selected_features


In [10]:
# Apply AIC-based feature selection
selected_features_aic = forward_elimination_aic(X, Y)
selected_features_aic


['Marketing Spend', 'R&D Spend']

###### 2. Recursive Feature Elimination (RFE)

RFE works by recursively fitting the model and removing the least significant feature until the optimal set of features is found.

In [11]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression

# Initialize RFE
rfe_selector = RFE(estimator=LinearRegression(), n_features_to_select=3)
rfe_selector.fit(X, Y)

# Get selected features
rfe_selected_features = X.columns[rfe_selector.support_]
rfe_selected_features


Index(['Marketing Spend', 'State_Florida', 'State_New York'], dtype='object')

###### 3. Lasso Regression for Automatic Feature Selection

Lasso Regression (L1 regularization) can automatically shrink less important feature coefficients to zero, effectively performing feature selection.

In [12]:
from sklearn.linear_model import LassoCV

# Use Lasso for feature selection
lasso = LassoCV(cv=5, random_state=0).fit(X, Y)

# Identify selected features
lasso_selected_features = X.columns[(lasso.coef_ != 0)]
lasso_selected_features


Index(['R&D Spend', 'Marketing Spend'], dtype='object')

##### Step 6: Evaluation of the Final Model

Compare the performance of the models generated by different feature selection methods using metrics such as R-squared, MSE, or Adjusted R-squared.

In [13]:
# Evaluate final model using selected features
X_final = X[selected_features]  # Replace with your method's selected features
X_train, X_test, Y_train, Y_test = train_test_split(X_final, Y, test_size=0.2, random_state=0)

regressor = LinearRegression()
regressor.fit(X_train, Y_train)

# Make predictions
Y_pred = regressor.predict(X_test)

# Calculate performance metrics
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(Y_test, Y_pred)
r2 = r2_score(Y_test, Y_pred)

mse, r2


(24093801.235591393, 0.5599364532356763)