<a href="https://colab.research.google.com/github/up2113232/up2113232_coursework/blob/dev/Q2_folder/Q2_NN_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Gaming Mental Health Prediction with Neural Networks
## Predicting GAD_T, SWL_T, and SPIN_T Scores from Gaming Behavior

This notebook demonstrates how to build a neural network to predict mental health scores
(GAD_T = Anxiety, SWL_T = Life Satisfaction, SPIN_T = Social Phobia) based on gaming habits.

**Dataset:** Online Gaming Anxiety Data from Kaggle
**Target Variables:** GAD_T, SWL_T, SPIN_T


First things first we have to import all of our important libraries that will be used.

In [9]:
# Import necessary libraries
import sys
import os

# Add parent directory to path to import our functions
# This line ensures Python can find our custom 'functions.py' file.
sys.path.append('..')

# Core data manipulation and visualisation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Machine learning and preprocessing
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder, MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.neural_network import MLPRegressor
from sklearn.multioutput import MultiOutputRegressor

# System and warnings
import warnings
warnings.filterwarnings('ignore')

Then we will load the functions we will use in this notebook from our functions file

In [14]:
from functions import clean_data, encode_features

Now we have to load our data as a pandas data frame so we can manipulate it easier.

In [11]:
# Load the dataset from a CSV file into a pandas DataFrame.
# The 'encoding' parameter is specified to handle potential character encoding issues in the file.
try:
  df = pd.read_csv('gaming_anxiety_data.csv', encoding='ISO-8859-1')

except FileNotFoundError:
  print(" File not found! Please upload your dataset first.")


In [12]:
# Define the columns that will be used as input features for our Neural Network.
# These are the variables that the network will use to make predictions.
feature_columns = ['GADE', 'Game', 'Hours', 'earnings', 'whyplay',
                   'streams', 'Narcissism', 'Gender',
                   'Age', 'Work', 'Playstyle']

# Define the target columns. These are the psychological metrics we want to predict.
# Our network will try to learn the relationship between 'feature_columns' and 'target_columns'.
target_columns = ['GAD_T', 'SWL_T', 'SPIN_T']

# Create a new DataFrame 'df' containing only the selected feature and target columns.
# .copy() is used to ensure we are working with a separate copy of the data, preventing unintended modifications to the original DataFrame.
df = df[feature_columns + target_columns].copy()

In [15]:
# Call the 'clean_data' function to handle initial data cleaning steps, such as removing duplicates
# and displaying missing values. The result is stored in 'df_cleaned_initial'.
print("Cleaning dataset...")
df_cleaned_initial = clean_data(df)

Cleaning dataset...
Missing values per column:
GADE          649
Game            0
Hours          30
earnings        0
whyplay         0
streams       100
Narcissism     23
Gender          0
Age             0
Work           38
Playstyle       0
GAD_T           0
SWL_T           0
SPIN_T        650
dtype: int64
Removed 0 rows with missing values
Removed 51 duplicate rows


In [16]:
# We will encode the string values into corresponding numbers using 'encode_features'.
# As Neural Networks typically require numerical inputs
df_encoded = encode_features(df_cleaned_initial)

# We will then clean up any remaining missing values by dropping rows that contain NaN (Not a Number).
# This ensures that our final dataset 'df_clean' is entirely numerical and free of missing data
df_clean = df_encoded.dropna()

# Print a summary of missing values to confirm the cleaning process was successful.
print(f"\n Original Missing values: {df.isnull().sum().sum()}")
print(f" Missing values after cleaning and encoding: {df_clean.isnull().sum().sum()}")
if df_clean.isnull().sum().sum() > 0:
    print("Columns with missing values in df_clean:")
    print(df_clean.isnull().sum()[df_clean.isnull().sum() > 0])


 Original Missing values: 1490
 Missing values after cleaning and encoding: 0
