# Stratified K-Fold Cross Validation for Model Training
**Table of Contents**
1. [Setup](#1-setup)
2. [Dataset Preparation](#2-dataset-preparation)
3. [K-Fold CV](#3-k-fold-cv)


## 1. Setup

In [1]:
# Automatic reloading
%load_ext autoreload
%autoreload 2

In [None]:
####################
# Required Modules #
####################

# Generic/Built-in
import random
import sys 
import os

# Libs
import torch
import numpy as np

In [None]:
# Add the project root directory to the system path to enable imports from the '/src' folder.

# Get the project directory 
current_dir = os.path.abspath('') # Current '\notebooks' directory
project_dir = os.path.abspath(os.path.join(current_dir, '..')) # Move up one level to project root directory

# Add the project directory to sys.path
sys.path.append(project_dir)

# Move up to project directory
os.chdir(project_dir)
os.getcwd()

# Import custom modules
from src.data_preparation import *
from src.models import *
from src.train_eval import *

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
# Seeding
SEED = 42

# To be safe, seed all modules for full reproducibility
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)  # If using CUDA
np.random.seed(SEED)
random.seed(SEED)

## 2. Dataset Preparation

In [14]:
data_dir = "data"
sequence_size = 250
stride = 25
gap_threshold = 0.05

# 0. Download the data
download_har70plus_dataset(base_dir=data_dir)

# 1. Load the downloaded data
df = load_har70_csv_files(base_dir=data_dir)

# 2. Create the HARDataset object
dataset = HARDataset(df, sequence_size, stride, gap_threshold)

📂 Dataset already downloaded: data\har70.zip
📂 Dataset already extracted in data\har70plus
✅ Successfully loaded HAR70+ dataset (2259597 timestep samples).


## 3. K-Fold CV

In [16]:
# Hyperparameters
num_folds = 5
batch_size = 128
learning_rate = 0.001
num_epochs = 10

In [17]:
mean_accuracy, mean_f1, mean_precision, mean_recall = kfold_stratified_cv_for_har(
    dataset=dataset, 
    model_class=HarTransformer,
    optimizer_class=torch.optim.Adam,
    k=num_folds,
    batch_size=batch_size,
    num_epochs=num_epochs,
    model_kwargs=None,
    optimizer_kwargs={"lr": learning_rate},
    random_state=SEED,
)

Fold [1/5]
HarTransformer model loaded on cuda.
Accuracy: 0.6402, F1: 0.6749, Precision: 0.7840, Recall: 0.6402
Fold [2/5]
HarTransformer model loaded on cuda.
Accuracy: 0.6062, F1: 0.6342, Precision: 0.7530, Recall: 0.6062
Fold [3/5]
HarTransformer model loaded on cuda.
Accuracy: 0.6648, F1: 0.6932, Precision: 0.7527, Recall: 0.6648
Fold [4/5]
HarTransformer model loaded on cuda.
Accuracy: 0.6289, F1: 0.6415, Precision: 0.7470, Recall: 0.6289
Fold [5/5]
HarTransformer model loaded on cuda.
Accuracy: 0.6411, F1: 0.6678, Precision: 0.7123, Recall: 0.6411
Final results (mean over 5 folds)
Accuracy: 0.6363, F1: 0.6623, Precision: 0.7498, Recall: 0.6363
