# Wassim - Lab 4
# Neural Network (Regression)

Objective:
In this lab, you will use a neural network and a gradient boosting model to predict student performance based on the dataset available at Kaggle: Student Performance - Multiple Linear Regression. The goal is to practice applying and comparing the performance of neural network and gradient boosting regression models.

# 0 Loading Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display
from sklearn.model_selection import train_test_split

from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error, mean_absolute_error

# 1 Data Preparation

In [2]:
# 1.1 Load Dataset
df = pd.read_csv('Student_Performance.csv')

# 1.2 Handling Missing Values
clean_df = df.dropna()

# 1.3 Encoding Categorical Variables
clean_df.loc[clean_df['Extracurricular Activities']=='No','Extracurricular Activities']=0
clean_df.loc[clean_df['Extracurricular Activities']=='Yes','Extracurricular Activities']=1

# 1.4 Data Splitting
y = np.array(clean_df['Performance Index'], dtype=int)
X = np.array(clean_df.drop(columns='Performance Index'), dtype=int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# 2 Modeling

In [3]:
# 2.1 Neural Network Model
# hidden_layer_sizes define the number of layers and the number of neurones in each of them
# max_iter is the maximum number of iterations during training
# random_state is the seed to get the same outcome everytime
mlp_reg = MLPRegressor(hidden_layer_sizes=(50, 30), max_iter=1000, random_state=0)
mlp_reg.fit(X_train, y_train)

y_pred_neural = mlp_reg.predict(X_test)

# 2.2 Gradient Boosting Model
# n_estimators is the number of estimators, here the number of trees
# learning_rate controls the step size that each tree make to correct the error of the model
gb_reg = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, random_state=0)
gb_reg.fit(X_train, y_train)

y_pred_gradient = gb_reg.predict(X_test)

# 3 Evaluation Metrics

In [4]:
# 3.1 Metrics on Test Data
MSE_N = mean_squared_error(y_test, y_pred_neural)
MSE_G = mean_squared_error(y_test, y_pred_gradient)
MSA_N = mean_absolute_error(y_test, y_pred_neural)
MSA_G = mean_absolute_error(y_test, y_pred_gradient)

if MSE_N < MSE_G:
    print(f"MSE : Neural ({MSE_N}) < Gradient ({MSE_G})")
else:
    print(f"MSE : Gradient ({MSE_G}) < Neural ({MSE_N})")

if MSA_N < MSA_G:
    print(f"MSA : Neural ({MSA_N}) < Gradient ({MSA_G})")
else:
    print(f"MSA : Gradient ({MSA_G}) < Neural ({MSA_N})")

# 3.2 Comparison and Analysis
# Based on the MSE and MSA results, the Neural Network model appears to be better than the Gradient Boosting model in our case.
# The results are slightly different and we might need further processing to determine if we are choosing the right model.

MSE : Neural (4.309024000856876) < Gradient (4.528888593475044)
MSA : Neural (1.6490915797344512) < Gradient (1.706846016050602)
