<center><h1>Modelling and Evaluation</h1></center>

In [42]:
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.feature_selection import RFECV
from sklearn.model_selection import StratifiedKFold
import matplotlib.pyplot as plt
from sklearn.feature_selection import RFE 

<h2>Importing Datasets</h2>

<h4>Dataset with all features</h4>

In [23]:
X_train = pd.read_csv("../Processed Data/All Features/X_train.csv")
X_test = pd.read_csv("../Processed Data/All Features/X_test.csv")
Y_train = pd.read_csv("../Processed Data/All Features/Y_train.csv")
Y_test = pd.read_csv("../Processed Data/All Features/Y_test.csv")

We can only feed the activity codes to our model, so we single out the activity codes and use it as y_train and y_test

In [26]:
y_train = Y_train["activity_code"] - 1
y_test = Y_test["activity_code"] - 1

<h4>Dataset with features that are mean values</h4>

In [7]:
X_train_mean = pd.read_csv("../Processed Data/Mean Features/feature_reduced_X_train.csv")
X_test_mean = pd.read_csv("../Processed Data/Mean Features/feature_reduced_X_test.csv")

<h4>Dataset with PCA components that explain 95% variance</h4>

In [15]:
X_train_pca = pd.read_csv("../Processed Data/PCA sets/X_train_pca_95.csv")
X_test_pca = pd.read_csv("../Processed Data/PCA sets/X_test_pca_95.csv")

<h4>Dataset based on top 50 percent of ANOVA F-test results, further filtered based on correlation</h4>

In [19]:
X_train_anova = pd.read_csv("../Processed Data/ANOVA hybrid set/X_train_anova_filtered.csv")
X_test_anova = pd.read_csv("../Processed Data/ANOVA hybrid set/X_test_anova_filtered.csv")

<h2>Training and Evaluation function</h2>

In [4]:
def evaluate_model(model, X_train, y_train, X_test, y_test, model_name, feature_set_name):
    """
    Trains, evaluates, and prints comprehensive metrics for a model.
    Returns a dictionary with all results.
    """
    # Train model
    model.fit(X_train, y_train)
    
    # Predictions
    y_pred = model.predict(X_test)
    y_pred_proba = model.predict_proba(X_test) if hasattr(model, 'predict_proba') else None
    
    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted', zero_division=0)
    recall = recall_score(y_test, y_pred, average='weighted', zero_division=0)
    f1 = f1_score(y_test, y_pred, average='weighted', zero_division=0)
    
    # Cross-validation for more robust estimate
    cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
    cv_mean = cv_scores.mean()
    cv_std = cv_scores.std()
    
    # Create results dictionary
    results = {
        'model': model_name,
        'feature_set': feature_set_name,
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'cv_mean': cv_mean,
        'cv_std': cv_std,
        'num_features': X_train.shape[1]
    }
    
    # Print results
    print(f"\n{'='*50}")
    print(f"Results for {model_name} on {feature_set_name}:")
    print(f"{'='*50}")
    print(f"Accuracy:  {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall:    {recall:.4f}")
    print(f"F1-Score:  {f1:.4f}")
    print(f"CV Accuracy: {cv_mean:.4f} (±{cv_std:.4f})")
    print(f"Number of features: {X_train.shape[1]}")
    
    return results

# Initialize list to store all results
all_results = []

The <b>evaluate_model</b> function is going to train and evaluate both Random Forest and XGBoost models on 4 different feature sets based on various evaluation metrics.

So now what we are going to do is feed the models we want to train and on the feature sets we want to train them to our evaluate_model function.

In [28]:
feature_sets = {
    'A_All_Features': (X_train, X_test),
    'B_Mean_Features': (X_train_mean, X_test_mean),
    'C_PCA_Features': (X_train_pca, X_test_pca),
    'D_ANOVA_Filtered': (X_train_anova, X_test_anova)
}

rf_model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
xgb_model = XGBClassifier(n_estimators=100, random_state=42, n_jobs=-1, eval_metric='mlogloss')

models = {
    'Random_Forest': rf_model,
    'XGBoost': xgb_model
}

# Initialize list to store all results
all_results = []

print("Starting experiments...")
print(f"Total combinations: {len(models)} models × {len(feature_sets)} feature sets = {len(models) * len(feature_sets)} experiments")

# Run all combinations
for model_name, model in models.items():
    for feature_name, (X_train_set, X_test_set) in feature_sets.items():
        # Use the evaluate_model function we already created
        results = evaluate_model(
            model=model, 
            X_train=X_train_set, 
            y_train=y_train,  # Should be the same for all
            X_test=X_test_set, 
            y_test=y_test,    # Should be the same for all
            model_name=model_name, 
            feature_set_name=feature_name
        )
        all_results.append(results)

print("\n" + "="*60)
print("ALL EXPERIMENTS COMPLETED!")
print("="*60)

# Convert results to DataFrame for easy analysis
results_df = pd.DataFrame(all_results)
print("\nSummary of all results:")
print(results_df[['model', 'feature_set', 'accuracy', 'f1_score', 'num_features']].to_string(index=False))

Starting experiments...
Total combinations: 2 models × 4 feature sets = 8 experiments

Results for Random_Forest on A_All_Features:
Accuracy:  0.9257
Precision: 0.9270
Recall:    0.9257
F1-Score:  0.9255
CV Accuracy: 0.9172 (±0.0164)
Number of features: 561

Results for Random_Forest on B_Mean_Features:
Accuracy:  0.9270
Precision: 0.9283
Recall:    0.9270
F1-Score:  0.9269
CV Accuracy: 0.9227 (±0.0171)
Number of features: 66

Results for Random_Forest on C_PCA_Features:
Accuracy:  0.8880
Precision: 0.8925
Recall:    0.8880
F1-Score:  0.8868
CV Accuracy: 0.8598 (±0.0171)
Number of features: 102

Results for Random_Forest on D_ANOVA_Filtered:
Accuracy:  0.9046
Precision: 0.9077
Recall:    0.9046
F1-Score:  0.9048
CV Accuracy: 0.9094 (±0.0211)
Number of features: 86

Results for XGBoost on A_All_Features:
Accuracy:  0.9393
Precision: 0.9402
Recall:    0.9393
F1-Score:  0.9391
CV Accuracy: 0.9214 (±0.0205)
Number of features: 561

Results for XGBoost on B_Mean_Features:
Accuracy:  0.9074


<h3>Observations</h3>

<ul>
    <li>Since our original dataset is not skewed for any single activity, accuracy is a good metric to evaluate our models performance on</li>
    <li>The feature set with all the 561 features trained with the XGBoost model is the best model we have and can say that it is for Random Forest model too. This models trained with this feature set are going to be our <b>Baseline models</b> for comparision with other feature sets.</li>
    <li>The best performing model with diminished feature set (just 11 percent of features from the original dataset) is going to be the feature set with only mean features, which looks like it performed almost on par with the baseline model for Random Forest while it performed very well with XGBoost too.</li>
    <li>The PCA components also performed very well with accuracy of almost 89 percent and 91 percent for Random Forest and XGBoost models respectively</li>
    <li>The feature set we got based on ANOVA F-test score and further filtering based on correlation perform very well with just over 90 percent accuracy for both our models</li>

<h2>Recursive Feature Elimination</h2>

In [49]:
xgb_model = XGBClassifier(n_estimators=100, random_state=42, eval_metric='mlogloss')

# Create and fit RFECV
min_features_to_select = 20
cv = StratifiedKFold(5)

rfecv = RFECV(
    estimator=xgb_model,
    step=10,
    cv=cv,
    scoring='accuracy',
    min_features_to_select=min_features_to_select,
    n_jobs=-1
)

print("Fitting RFECV (this may take a while...)")
rfecv.fit(X_train_anova, y_train)
print("RFECV fitting completed!")

# Plot the RFECV results
plt.figure(figsize=(10, 6))
x_values = range(min_features_to_select, len(rfecv.cv_results_['mean_test_score']) + min_features_to_select)
plt.plot(x_values, rfecv.cv_results_['mean_test_score'])
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (accuracy)")
plt.title("RFECV Performance vs Number of Features")
plt.grid(True)
plt.show()

# Find the true optimal number of features from CV scores
cv_scores = rfecv.cv_results_['mean_test_score']
optimal_num_features = np.argmax(cv_scores) + min_features_to_select
max_cv_score = np.max(cv_scores)

print(f"RFECV reported optimal features: {rfecv.n_features_}")
print(f"True optimal number of features from CV: {optimal_num_features}")
print(f"Maximum CV accuracy: {max_cv_score:.4f}")

# Create both feature sets for comparison
print("\n" + "="*50)
print("CREATING AND EVALUATING BOTH FEATURE SETS")
print("="*50)

# Set 1: RFECV-reported features (86 features)
X_train_reported = X_train_anova.iloc[:, rfecv.support_]
X_test_reported = X_test_anova.iloc[:, rfecv.support_]

# Set 2: True optimal features (27 features)
optimal_rfe = RFE(estimator=xgb_model, n_features_to_select=optimal_num_features)
optimal_rfe.fit(X_train_anova, y_train)
X_train_optimal = X_train_anova.iloc[:, optimal_rfe.support_]
X_test_optimal = X_test_anova.iloc[:, optimal_rfe.support_]

true_optimal_features = X_train_anova.columns[optimal_rfe.support_].tolist()
print(f"True optimal features ({len(true_optimal_features)}):")
for i, feat in enumerate(true_optimal_features[:15]):
    print(f"  {i+1}. {feat}")
if len(true_optimal_features) > 15:
    print(f"  ... and {len(true_optimal_features) - 15} more")

# Evaluate both feature sets
new_results = []

# 1. Evaluate RFECV-reported features
print("\n1. Evaluating RFECV-reported features...")
reported_results = evaluate_model(
    model=XGBClassifier(n_estimators=100, random_state=42, eval_metric='mlogloss'),
    X_train=X_train_reported,
    y_train=y_train,
    X_test=X_test_reported,
    y_test=y_test,
    model_name='XGBoost',
    feature_set_name='E_RFE_Reported'
)
new_results.append(reported_results)

# 2. Evaluate truly optimal features
print("\n2. Evaluating truly optimal features...")
optimal_results = evaluate_model(
    model=XGBClassifier(n_estimators=100, random_state=42, eval_metric='mlogloss'),
    X_train=X_train_optimal,
    y_train=y_train,
    X_test=X_test_optimal,
    y_test=y_test,
    model_name='XGBoost',
    feature_set_name='E_RFE_Optimal_True'
)
new_results.append(optimal_results)

# Compare results
print("\n" + "="*60)
print("COMPARISON RESULTS")
print("="*60)
print(f"RFECV Reported: {rfecv.n_features_} features, Accuracy: {reported_results['accuracy']:.4f}")
print(f"True Optimal: {optimal_num_features} features, Accuracy: {optimal_results['accuracy']:.4f}")

# Add to main results and display final summary
all_results.extend(new_results)
results_df = pd.DataFrame(all_results)

print("\nFinal summary of all results:")
final_summary = results_df[['model_name', 'feature_set_name', 'accuracy', 'f1_score', 'num_features']].drop_duplicates()
print(final_summary.to_string(index=False))

1. Evaluating RFECV-selected features...

Results for XGBoost on E_RFE_Reported:
Accuracy:  0.9063
Precision: 0.9080
Recall:    0.9063
F1-Score:  0.9063
CV Accuracy: 0.9127 (±0.0189)
Number of features: 86

2. Evaluating truly optimal features...

Results for XGBoost on E_RFE_Optimal_True:
Accuracy:  0.8996
Precision: 0.9008
Recall:    0.8996
F1-Score:  0.8994
CV Accuracy: 0.9071 (±0.0227)
Number of features: 27

COMPARISON RESULTS
RFECV Reported: 86 features, Accuracy: 0.9063
True Optimal: 27 features, Accuracy: 0.8996
Feature reduction: 86 → 27 features (68.6% reduction)

Final summary of all results (without duplication):
        model        feature_set  accuracy  f1_score  num_features
Random_Forest     A_All_Features  0.925687  0.925544           561
Random_Forest    B_Mean_Features  0.927044  0.926894            66
Random_Forest     C_PCA_Features  0.888022  0.886814           102
Random_Forest   D_ANOVA_Filtered  0.904649  0.904786            86
      XGBoost     A_All_Features