# Final Test-Set Evaluation

This notebook evaluates our selected best models on the held-out test set.
The SVM model (for Anxiety Score) and the Ensemble model (for Depression Score) were chosen based on cross-validated performance during optimization.


Load train and test data

In [2]:
import pandas as pd
from sklearn.metrics import mean_squared_error, r2_score

train = pd.read_csv("../Data/train.csv")
test = pd.read_csv("../Data/test.csv")

exclude_cols = ["Medication_Use", "Substance_Use"]
targets = ["Depression_Score", "Anxiety_Score"]

Features for Depression

In [3]:
X_train_dep = train.drop(columns=targets + exclude_cols)
y_train_dep = train["Depression_Score"]

X_test_dep = test.drop(columns=targets + exclude_cols)
y_test_dep = test["Depression_Score"]

Features for Anxiety

In [4]:
X_train_anx = train.drop(columns=targets + exclude_cols)
y_train_anx = train["Anxiety_Score"]

X_test_anx = test.drop(columns=targets + exclude_cols)
y_test_anx = test["Anxiety_Score"]

In [6]:
#import from sklearn
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR

numeric_features = X_train_dep.select_dtypes(include=["int64", "float64"]).columns
categorical_features = X_train_dep.select_dtypes(include=["object"]).columns

preprocess = ColumnTransformer(
    [
        ("num", StandardScaler(), numeric_features),
        ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_features),
    ]
)

Final Models with Best Parameters

In [8]:
best_ensemble = Pipeline(
    steps=[
        ("preprocess", preprocess),
        ("model", RandomForestRegressor(
            n_estimators=200,
            random_state=42))
    ]
)

best_svm = Pipeline(
    steps=[
        ("preprocess", preprocess),
        ("model", SVR(
            kernel="rbf",
            C=0.1,
            gamma="scale"))
    ]
)


Fit each model on Training 

In [10]:

best_ensemble.fit(X_train_dep, y_train_dep)
best_svm.fit(X_train_anx, y_train_anx)


Predict on Test Set

In [11]:
# depression
y_pred_dep = best_ensemble.predict(X_test_dep)

# anxiety
y_pred_anx = best_svm.predict(X_test_anx)


Final Metrics

In [12]:
# depression model metrics
rmse_dep = (mean_squared_error(y_test_dep, y_pred_dep) ** 0.5)
r2_dep = r2_score(y_test_dep, y_pred_dep)

# anxiety model metrics
rmse_anx = (mean_squared_error(y_test_anx, y_pred_anx) ** 0.5)
r2_anx = r2_score(y_test_anx, y_pred_anx)

print("Final Test Performance")
print("----------------------")
print("Depression - Ensemble")
print("RMSE:", rmse_dep)
print("R2:", r2_dep)

print("\nAnxiety - SVM")
print("RMSE:", rmse_anx)
print("R2:", r2_anx)


Final Test Performance
----------------------
Depression - Ensemble
RMSE: 5.410779489823994
R2: -0.022584900465955515

Anxiety - SVM
RMSE: 5.835068476200827
R2: -0.002403412188760834


## Interpretation

The ensemble model for depression and the SVM model for anxiety were evaluated one time on the test set. Both models achieved RMSE values of around five to six points, and both produced slightly negative R-squared values. A negative R-squared indicates that the models do not explain more variation in the outcome than a simple baseline that predicts the average score. These results suggest that demographic and lifestyle features alone contain limited predictive signal for depression and anxiety.

In [13]:
results_table = pd.DataFrame({
    "Model": ["Ensemble", "SVM"],
    "Target": ["Depression_Score", "Anxiety_Score"],
    "RMSE": [rmse_dep, rmse_anx],
    "R2": [r2_dep, r2_anx]
})

results_table

Unnamed: 0,Model,Target,RMSE,R2
0,Ensemble,Depression_Score,5.410779,-0.022585
1,SVM,Anxiety_Score,5.835068,-0.002403
