<a href="https://colab.research.google.com/github/up2113232/up2113232_coursework/blob/dev/Q1_folder/ML_approach.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


Question 1: Traditional Machine Learning Approach
Predicting GAD_T, SWL_T, and SPIN_T from Online Gaming Anxiety Data

 **Dataset:** Online Gaming Anxiety Data from Kaggle


Introduction
 This notebook explores traditional (non-neural network) machine learning approaches for predicting three psychological metrics from online gaming data:
 1. **GAD_T**: Generalized Anxiety Disorder score
 2. **SWL_T**: Satisfaction With Life score  
 3. **SPIN_T**: Social Phobia Inventory score

We'll compare multiple traditional ML algorithms to establish a performance baseline before moving to neural networks in Q2.

Objectives
 - Load and explore the gaming anxiety dataset
 - Preprocess data for machine learning
 - Implement and compare traditional ML models
 - Evaluate model performance using appropriate metrics
 - Interpret results and draw conclusions


In [1]:
# Import necessary libraries
import sys
import os

# Add parent directory to path to import our functions
sys.path.append('..')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import warnings
warnings.filterwarnings('ignore')

# Set style for better visualisations
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")


Now we are going to import the helper functions from our functions.py file. These will be explained in the file if you are wondering what each of them do.

In [2]:
from functions import clean_data, encode_features

Now we are going to load our data set

In [3]:
df = pd.read_csv('gaming_anxiety_data.csv', encoding='ISO-8859-1')

Now we are going to clean our data, this is so there are no missing values in our data

In [4]:
# Clean the data
print("Cleaning dataset...")
df_clean = clean_data(df)

Cleaning dataset...
Missing values per column:
S. No.                 0
Timestamp              0
GAD1                   0
GAD2                   0
GAD3                   0
GAD4                   0
GAD5                   0
GAD6                   0
GAD7                   0
GADE                 649
SWL1                   0
SWL2                   0
SWL3                   0
SWL4                   0
SWL5                   0
Game                   0
Platform               0
Hours                 30
earnings               0
whyplay                0
League              1852
highestleague      13464
streams              100
SPIN1                124
SPIN2                154
SPIN3                140
SPIN4                159
SPIN5                166
SPIN6                156
SPIN7                138
SPIN8                144
SPIN9                158
SPIN10               160
SPIN11               187
SPIN12               168
SPIN13               187
SPIN14               156
SPIN15               147
SPI

Now we will select the feature columns we want our code to use and our targets columns we want to predict

In [5]:
feature_columns = ['GADE', 'Game', 'Hours', 'earnings', 'whyplay',
                   'League', 'streams', 'Narcissism', 'Gender',
                   'Age', 'Work', 'Playstyle']
target_columns = ['GAD_T', 'SWL_T', 'SPIN_T']

Now we will encode any non-numerical data within our target columnns into corresponding numbers, so that our Machine Learning will be effective as it can only learn from numbers

In [7]:
data = df[feature_columns + target_columns].copy()
df = encode_features(data)