# Klasyfikacja Emocji z ML Time Series - WESAD Dataset

## Plan analizy

Ten notebook wykonuje kompletnƒÖ analizƒô klasyfikacji emocji z danych WESAD u≈ºywajƒÖc **modelu time series (LSTM/GRU)** zamiast tradycyjnych modeli ML.

1. **Import bibliotek** - numpy, pandas, matplotlib, seaborn, scikit-learn, imblearn, tensorflow/keras
2. **Wczytanie danych** - CSV/PKL z WESAD, sprawdzenie rozk≈Çadu klas
3. **Segmentacja danych** - sliding windows z ekstrakcjƒÖ cech (mean, std, min, max, range, RMS, kurtosis, skewness, RMSSD, slope, respiration rate)
4. **Encoding i skalowanie** - LabelEncoder dla targetu, StandardScaler dla cech
5. **Podzia≈Ç Train/Test** - **Subject-wise split** (ca≈Çe osoby do train/test, nie dzielimy okien)
6. **Balansowanie** - SMOTE na train, weryfikacja balansu
7. **Przygotowanie danych dla time series** - tworzenie sekwencji z okien
8. **Trenowanie modelu time series** - LSTM/GRU
9. **Ewaluacja** - confusion matrix, accuracy, balanced accuracy, macro F1, per-class metrics
10. **Wnioski i raport** - analiza wynik√≥w modelu time series
11. **Wizualizacje** - wykresy rozk≈Çadu klas, metryk, confusion matrices

## ‚ö†Ô∏è WA≈ªNE: Subject-wise Split

- **Ca≈Çe dane jednej osoby** trafiajƒÖ albo do train, albo do test
- **Nigdy nie dzielimy** okien z tej samej osoby miƒôdzy train i test
- To zapewnia **realistycznƒÖ generalizacjƒô** na nowych osobach

## üÜï R√≥≈ºnice wzglƒôdem 06_klasyfikacja_emocji_smote.ipynb

- U≈ºywamy **tylko jednego modelu** - ML Time Series (LSTM/GRU)
- Dane sƒÖ przekszta≈Çcane w **sekwencje czasowe** (ka≈ºde okno to jeden timestep)
- Model uczy siƒô **zale≈ºno≈õci temporalnych** miƒôdzy oknami


In [6]:
# ============================================================================
# KROK 1: IMPORT BIBLIOTEK
# ============================================================================

# Podstawowe importy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import pickle
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Podstawowe biblioteki zaimportowane (numpy, pandas, matplotlib, seaborn)")

# Scikit-learn
try:
    from sklearn.model_selection import train_test_split, GroupShuffleSplit
    from sklearn.preprocessing import LabelEncoder, StandardScaler
    from sklearn.metrics import (
        confusion_matrix, classification_report, accuracy_score,
        balanced_accuracy_score, f1_score, precision_score, recall_score
    )
    from sklearn.dummy import DummyClassifier
    print("‚úÖ Scikit-learn zaimportowany")
except ImportError as e:
    print(f"‚ùå B≈ÇƒÖd importu scikit-learn: {e}")
    raise

# Imbalanced-learn (SMOTE)
# Opcja: ustaw SKIP_IMBLEARN = True je≈õli imbalanced-learn powoduje crash
# Domy≈õlnie ustawione na True, aby uniknƒÖƒá crashu - zmie≈Ñ na False je≈õli chcesz u≈ºyƒá SMOTE
SKIP_IMBLEARN = True  # Zmie≈Ñ na False je≈õli chcesz u≈ºyƒá SMOTE (wymaga imbalanced-learn)

IMBLEARN_AVAILABLE = False

if not SKIP_IMBLEARN:
    print("üì¶ Pr√≥ba importu imbalanced-learn...")
    try:
        # Wy≈ÇƒÖcz warningi imbalanced-learn je≈õli powodujƒÖ problemy
        import warnings
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            from imblearn.over_sampling import SMOTE
        
        IMBLEARN_AVAILABLE = True
        print("‚úÖ imbalanced-learn zaimportowany (SMOTE dostƒôpny)")
    except ImportError as e:
        IMBLEARN_AVAILABLE = False
        print("‚ö†Ô∏è imbalanced-learn niedostƒôpny - zainstaluj: pip install imbalanced-learn")
        print(f"   B≈ÇƒÖd: {e}")
    except Exception as e:
        IMBLEARN_AVAILABLE = False
        print(f"‚ö†Ô∏è B≈ÇƒÖd podczas importu imbalanced-learn: {type(e).__name__}: {e}")
        print("   Kontynuujƒô bez SMOTE...")
        print("   Je≈õli imbalanced-learn powoduje crash kernela, ustaw SKIP_IMBLEARN = True na poczƒÖtku tej sekcji")
else:
    print("‚ö†Ô∏è imbalanced-learn pominiƒôty (SKIP_IMBLEARN = True)")
    print("   SMOTE nie bƒôdzie dostƒôpny - balansowanie danych mo≈ºe nie dzia≈Çaƒá")

# Scipy dla sygna≈Ç√≥w
# Opcja: ustaw SKIP_SCIPY = True je≈õli scipy powoduje crash
SKIP_SCIPY = True  # Zmie≈Ñ na False je≈õli chcesz u≈ºyƒá scipy (mo≈ºe powodowaƒá crash)

SCIPY_AVAILABLE = False

if not SKIP_SCIPY:
    print("üì¶ Pr√≥ba importu scipy...")
    try:
        from scipy.signal import resample
        from scipy import stats
        SCIPY_AVAILABLE = True
        print("‚úÖ Scipy zaimportowany")
    except ImportError as e:
        SCIPY_AVAILABLE = False
        print("‚ö†Ô∏è Scipy niedostƒôpny - zainstaluj: pip install scipy")
        print(f"   B≈ÇƒÖd: {e}")
        print("   Kontynuujƒô bez scipy - funkcja resample mo≈ºe nie dzia≈Çaƒá")
    except Exception as e:
        SCIPY_AVAILABLE = False
        print(f"‚ö†Ô∏è B≈ÇƒÖd podczas importu scipy: {type(e).__name__}: {e}")
        print("   Kontynuujƒô bez scipy - funkcja resample mo≈ºe nie dzia≈Çaƒá")
        print("   Je≈õli scipy powoduje crash kernela, ustaw SKIP_SCIPY = True na poczƒÖtku tej sekcji")
else:
    print("‚ö†Ô∏è Scipy pominiƒôty (SKIP_SCIPY = True)")
    print("   Funkcja resample nie bƒôdzie dostƒôpna - dane mogƒÖ nie byƒá resamplowane")

# TensorFlow/Keras dla modeli time series (LSTM/GRU)
# Uwaga: Import TensorFlow mo≈ºe powodowaƒá crash kernela - obs≈Çugujemy to bezpiecznie
TENSORFLOW_AVAILABLE = False

# Opcja: ustaw SKIP_TENSORFLOW = True je≈õli TensorFlow powoduje crash
# Domy≈õlnie ustawione na False, aby u≈ºyƒá TensorFlow - zmie≈Ñ na True je≈õli TensorFlow powoduje problemy
SKIP_TENSORFLOW = False  # Zmie≈Ñ na True je≈õli TensorFlow powoduje crash kernela

if not SKIP_TENSORFLOW:
    print("üì¶ Pr√≥ba importu TensorFlow/Keras...")
    try:
        # Wy≈ÇƒÖcz logi TensorFlow aby uniknƒÖƒá problem√≥w
        import os
        os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  # Ukryj warningi
        
        # WY≈ÅƒÑCZ GPU - zapobiega crashowi kernela
        os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
        
        print("   Importowanie tensorflow...")
        import tensorflow as tf
        
        # Wy≈ÇƒÖcz GPU w TensorFlow
        tf.config.set_visible_devices([], 'GPU')
        print("   ‚úÖ GPU wy≈ÇƒÖczone - u≈ºywam tylko CPU")
        
        print("   Importowanie modu≈Ç√≥w Keras...")
        from tensorflow import keras
        from tensorflow.keras.models import Sequential
        from tensorflow.keras.layers import LSTM, GRU, Dense, Dropout, BatchNormalization
        from tensorflow.keras.optimizers import Adam
        from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
        from tensorflow.keras.utils import to_categorical
        
        TENSORFLOW_AVAILABLE = True
        print("‚úÖ TensorFlow/Keras dostƒôpny - modele time series bƒôdƒÖ dostƒôpne")
        print(f"   Wersja TensorFlow: {tf.__version__}")
    except ImportError as e:
        TENSORFLOW_AVAILABLE = true
        print("‚ö†Ô∏è TensorFlow/Keras niedostƒôpny - modele time series nie bƒôdƒÖ dostƒôpne")
        print("   Zainstaluj: pip install tensorflow")
        print(f"   B≈ÇƒÖd: {e}")
    except Exception as e:
        TENSORFLOW_AVAILABLE = False
        print("‚ö†Ô∏è TensorFlow/Keras niedostƒôpny - wystƒÖpi≈Ç b≈ÇƒÖd podczas importu")
        print(f"   B≈ÇƒÖd: {type(e).__name__}: {e}")
        print("   Mo≈ºesz kontynuowaƒá bez TensorFlow, ale modele time series nie bƒôdƒÖ dostƒôpne")
        print("   Je≈õli TensorFlow powoduje crash kernela, ustaw SKIP_TENSORFLOW = True na poczƒÖtku tej kom√≥rki")
else:
    print("‚ö†Ô∏è TensorFlow pominiƒôty (SKIP_TENSORFLOW = True)")
    print("   Aby u≈ºyƒá TensorFlow, zmie≈Ñ SKIP_TENSORFLOW = False w tej kom√≥rce")
    print("   Modele time series nie bƒôdƒÖ dostƒôpne")

print("\n" + "="*60)
print("‚úÖ Wszystkie biblioteki zaimportowane pomy≈õlnie!")
print("="*60)


: 

## KROK 2: WCZYTYWANIE DANYCH

Wczytujemy dane WESAD z plik√≥w CSV i PKL. Sprawdzamy rozk≈Çad klas.

**UWAGA:** Ten krok jest identyczny jak w `06_klasyfikacja_emocji_smote.ipynb` - u≈ºywamy tych samych funkcji wczytywania danych.


In [1]:
# ============================================================================
# KROK 2: WCZYTYWANIE DANYCH
# ============================================================================

# Sprawd≈∫ czy importy zosta≈Çy wykonane (KROK 1)
try:
    # Sprawd≈∫ czy podstawowe importy sƒÖ dostƒôpne
    _ = Path
    _ = pd
    _ = np
    _ = pickle
except NameError as e:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce importy!")
    print(f"   B≈ÇƒÖd: {e}")
    print("   Najpierw uruchom KROK 1 (kom√≥rka z importami bibliotek)!")
    raise NameError(f"BrakujƒÖce importy - uruchom najpierw KROK 1: {e}")

# ≈öcie≈ºki
RAW_ROOT = Path("/Users/turfian/Downloads/archive (4)/WESAD")
PROJECT_ROOT = Path("/Users/turfian/Downloads/archive (4)/WESAD/wesad-prep")

# Parametry
TARGET_FS = 32.0
MAX_DURATION = pd.Timedelta(minutes=40)
DEFAULT_SUBJECTS = ["S2", "S3", "S4", "S5", "S6", "S7"]

# Mapowanie faz do klas (zgodnie z rzeczywistymi nazwami faz w WESAD)
PHASE_TO_CLASS = {
    "Base": "baseline",
    "Medi 1": "baseline",
    "Medi 2": "baseline",
    "TSST": "stress",
    "sRead": "stress",
    "fRead": "stress",
    "Fun": "amusement",
    # Alternatywne nazwy (na wypadek r√≥≈ºnic w plikach)
    "Stress": "stress",
    "Amusement": "amusement",
    "Meditation": "baseline",
}

# Funkcje pomocnicze do parsowania
def build_time_index(length: int, start_ts: float, fs: float) -> pd.Series:
    """Buduje indeks czasowy dla sygna≈Çu"""
    start = pd.to_datetime(start_ts, unit="s", utc=True)
    offsets = pd.to_timedelta(np.arange(length) / fs, unit="s")
    return start + offsets

def load_sensor_for_subject(subject_path: Path, sensor_name: str) -> pd.DataFrame:
    """Wczytuje dane z sensora (CSV)"""
    file_path = subject_path / f"{sensor_name}.csv"
    if not file_path.exists():
        return pd.DataFrame()
    
    header = pd.read_csv(file_path, nrows=2, header=None)
    start_ts = float(header.iloc[0, 0])
    fs = float(header.iloc[1, 0])
    
    column_names = {
        "ACC": ["acc_x", "acc_y", "acc_z"],
        "EDA": ["eda"],
        "BVP": ["bvp"],
        "TEMP": ["temp"],
        "HR": ["hr"],
    }.get(sensor_name, [sensor_name.lower()])
    
    data = pd.read_csv(file_path, skiprows=2, header=None, names=column_names)
    data.insert(0, "timestamp", build_time_index(len(data), start_ts, fs))
    data.attrs.update({"start_ts": start_ts, "fs": fs})
    return data

def load_wesad_pickle(subject: str, raw_root: Path = RAW_ROOT) -> dict:
    """Wczytuje dane z pliku PKL"""
    pkl_path = raw_root / subject / f"{subject}.pkl"
    if not pkl_path.exists():
        raise FileNotFoundError(f"Brak pliku {pkl_path}")
    with pkl_path.open("rb") as handle:
        return pickle.load(handle, encoding="latin1")

def load_tags_for_subject(subject_path: Path) -> pd.DataFrame:
    """Wczytuje tagi (etykiety) dla subjecta"""
    path = subject_path / "tags.csv"
    if not path.exists() or path.stat().st_size == 0:
        return pd.DataFrame(columns=["timestamp", "tag"])
    tags = pd.read_csv(path, header=None, names=["timestamp"])
    tags["timestamp"] = pd.to_datetime(tags["timestamp"], unit="s", utc=True)
    tags["tag"] = 1
    return tags

def build_phase_protocol_for_subject(subject: str, session_start: pd.Timestamp, raw_root: Path = RAW_ROOT) -> pd.DataFrame:
    """Buduje protok√≥≈Ç faz dla subjecta z pliku *_quest.csv"""
    quest_path = raw_root / subject / f"{subject}_quest.csv"
    if not quest_path.exists():
        return pd.DataFrame(columns=["phase", "start", "end", "duration_s"])
    
    lines = [line.strip() for line in quest_path.read_text().splitlines() if line.strip()]
    
    def _extract_values(lines, prefix):
        for line in lines:
            if line.startswith(prefix):
                return [token for token in line.split(";")[1:] if token]
        return []
    
    names = _extract_values(lines, "# ORDER")
    starts = _extract_values(lines, "# START")
    ends = _extract_values(lines, "# END")
    
    phases = []
    limit = min(len(names), len(starts), len(ends))
    for idx in range(limit):
        try:
            start_sec = float(starts[idx])
            end_sec = float(ends[idx])
            phase_name = names[idx].strip()
            phases.append({
                "phase": phase_name,
                "start": session_start + pd.to_timedelta(start_sec, unit="s"),
                "end": session_start + pd.to_timedelta(end_sec, unit="s"),
                "duration_s": end_sec - start_sec
            })
        except (ValueError, IndexError):
            continue
    
    return pd.DataFrame(phases)

def assign_phase_labels(timestamps: pd.Series, phases: pd.DataFrame) -> pd.Series:
    """Przypisuje etykiety faz do timestamp√≥w"""
    if phases.empty:
        return pd.Series(["unknown"] * len(timestamps), index=timestamps.index)
    intervals = pd.IntervalIndex.from_arrays(phases["start"], phases["end"], closed="left")
    labels = phases["phase"].to_list()
    idx = intervals.get_indexer(timestamps)
    label_array = np.array(labels, dtype=object)
    mapped = np.where(idx >= 0, label_array[idx], "unknown")
    return pd.Series(mapped, index=timestamps.index)

def resample_signal(array, src_fs: float, target_fs: float, target_len: int) -> np.ndarray:
    """Resampluje sygna≈Ç do docelowej czƒôstotliwo≈õci"""
    # Sprawd≈∫ czy scipy jest dostƒôpny
    try:
        SCIPY_AVAILABLE
    except NameError:
        SCIPY_AVAILABLE = False
    
    if not SCIPY_AVAILABLE:
        # Je≈õli scipy nie jest dostƒôpny, u≈ºyj prostego interpolacji numpy
        if array.ndim == 1:
            array = array[:, None]
        # U≈ºyj numpy.interp jako alternatywy
        original_indices = np.linspace(0, len(array) - 1, len(array))
        target_indices = np.linspace(0, len(array) - 1, target_len)
        resampled = np.zeros((target_len, array.shape[1]))
        for i in range(array.shape[1]):
            resampled[:, i] = np.interp(target_indices, original_indices, array[:, i])
        return resampled.flatten() if resampled.shape[1] == 1 else resampled
    
    if array.ndim == 1:
        array = array[:, None]
    expected_len = int(src_fs * MAX_DURATION.total_seconds())
    trimmed = array[:expected_len]
    if len(trimmed) == 0:
        return np.full((target_len, array.shape[1]), np.nan)
    return resample(trimmed, target_len, axis=0)

# Wczytanie danych dla wszystkich subject√≥w
print("=" * 80)
print("KROK 2: WCZYTYWANIE DANYCH")
print("=" * 80)

all_subjects_data = []

for subject in DEFAULT_SUBJECTS:
    print(f"\nüìÇ Wczytujƒô dane dla {subject}...")
    subject_path = RAW_ROOT / subject / f"{subject}_E4_Data"
    
    if not subject_path.exists():
        print(f"  ‚ö†Ô∏è Brak folderu {subject_path} - pomijam")
        continue
    
    # Wczytaj sygna≈Çy z nadgarstka (CSV)
    wrist_data = {}
    for sensor in ["ACC", "EDA", "BVP", "TEMP"]:
        sensor_df = load_sensor_for_subject(subject_path, sensor)
        if not sensor_df.empty:
            wrist_data[sensor.lower()] = sensor_df
    
    # Sprawd≈∫ czy mamy dane
    if not wrist_data:
        print(f"  ‚ö†Ô∏è Brak danych z nadgarstka - pomijam")
        continue
    
    # U≈ºyj timestamp z pierwszego sensora jako session_start
    first_sensor = list(wrist_data.values())[0]
    if len(first_sensor) == 0:
        print(f"  ‚ö†Ô∏è Pusty sensor - pomijam")
        continue
    
    session_start = first_sensor["timestamp"].iloc[0]
    
    # Wczytaj protok√≥≈Ç faz
    phases = build_phase_protocol_for_subject(subject, session_start)
    
    # Po≈ÇƒÖcz dane nadgarstka
    if wrist_data:
        # U≈ºyj BVP jako referencji czasowej
        if "bvp" in wrist_data:
            base_df = wrist_data["bvp"][["timestamp"]].copy()
            for sensor_name, sensor_df in wrist_data.items():
                if sensor_name != "bvp":
                    # Resample do czƒôstotliwo≈õci BVP
                    merged = pd.merge_asof(
                        base_df.sort_values("timestamp"),
                        sensor_df[["timestamp"] + [col for col in sensor_df.columns if col != "timestamp"]].sort_values("timestamp"),
                        on="timestamp",
                        direction="nearest",
                        tolerance=pd.Timedelta(seconds=1)
                    )
                    for col in sensor_df.columns:
                        if col != "timestamp":
                            base_df[col] = merged[col].values
            
            # Dodaj pozosta≈Çe kolumny z BVP
            for col in wrist_data["bvp"].columns:
                if col != "timestamp" and col not in base_df.columns:
                    base_df[col] = wrist_data["bvp"][col].values
            
            # Resample do docelowej czƒôstotliwo≈õci
            target_len = int(MAX_DURATION.total_seconds() * TARGET_FS)
            timestamps = session_start + pd.to_timedelta(np.arange(target_len) / TARGET_FS, unit="s")
            
            # Resample ka≈ºdej kolumny
            resampled_data = {}
            for col in base_df.columns:
                if col != "timestamp":
                    original_values = base_df[col].values
                    if len(original_values) > 0:
                        resampled = resample_signal(original_values, wrist_data["bvp"].attrs["fs"], TARGET_FS, target_len)
                        resampled_data[col] = resampled.flatten() if resampled.ndim > 1 else resampled
                    else:
                        resampled_data[col] = np.full(target_len, np.nan)
            
            # Stw√≥rz DataFrame
            subject_df = pd.DataFrame(resampled_data)
            subject_df.insert(0, "timestamp", timestamps)
            
            # Dodaj etykiety faz
            subject_df["phase"] = assign_phase_labels(subject_df["timestamp"], phases)
            subject_df["label"] = subject_df["phase"].map(PHASE_TO_CLASS).fillna("unknown")
            subject_df["subject"] = subject
            
            # Sprawd≈∫ rozk≈Çad faz i klas dla tego subjecta
            phase_dist = subject_df["phase"].value_counts()
            label_dist = subject_df["label"].value_counts()
            print(f"    Fazy: {dict(phase_dist)}")
            print(f"    Klasy: {dict(label_dist)}")
            
            all_subjects_data.append(subject_df)
            print(f"  ‚úÖ Wczytano {len(subject_df)} pr√≥bek")

# Po≈ÇƒÖcz wszystkie dane
if all_subjects_data:
    full_data = pd.concat(all_subjects_data, ignore_index=True)
    print(f"\n{'='*80}")
    print("PODSUMOWANIE WCZYTYWANIA DANYCH")
    print(f"{'='*80}")
    print(f"‚úÖ Wczytano dane dla {len(all_subjects_data)} subject√≥w")
    print(f"   ≈ÅƒÖczna liczba pr√≥bek: {len(full_data)}")
    
    print(f"\nüìä SZCZEG√ì≈ÅOWY ROZK≈ÅAD KLAS PER SUBJECT:")
    print("-" * 80)
    for subject in full_data["subject"].unique():
        subject_data = full_data[full_data["subject"] == subject]
        print(f"\n  {subject}:")
        label_dist = subject_data["label"].value_counts()
        for label in label_dist.index:
            count = label_dist[label]
            pct = (count / len(subject_data) * 100) if len(subject_data) > 0 else 0
            print(f"    {label:12s}: {count:6d} pr√≥bek ({pct:6.2f}%)")
    
    print(f"\nüìä ROZK≈ÅAD FAZ PER SUBJECT:")
    print("-" * 80)
    for subject in full_data["subject"].unique():
        subject_data = full_data[full_data["subject"] == subject]
        print(f"\n  {subject}:")
        phase_dist = subject_data["phase"].value_counts()
        for phase in phase_dist.index:
            count = phase_dist[phase]
            pct = (count / len(subject_data) * 100) if len(subject_data) > 0 else 0
            print(f"    {phase:15s}: {count:6d} pr√≥bek ({pct:6.2f}%)")
    
    print(f"\nüìä GLOBALNY ROZK≈ÅAD KLAS (wszystkie subjecty):")
    print("-" * 80)
    class_dist = full_data["label"].value_counts()
    for label in class_dist.index:
        count = class_dist[label]
        pct = (count / len(full_data) * 100) if len(full_data) > 0 else 0
        print(f"   {label:12s}: {count:6d} pr√≥bek ({pct:6.2f}%)")
    
    print(f"\nüìä GLOBALNY ROZK≈ÅAD FAZ (wszystkie subjecty):")
    print("-" * 80)
    phase_dist = full_data["phase"].value_counts()
    for phase in phase_dist.index:
        count = phase_dist[phase]
        pct = (count / len(full_data) * 100) if len(full_data) > 0 else 0
        print(f"   {phase:15s}: {count:6d} pr√≥bek ({pct:6.2f}%)")
    
    print(f"\nüìä KOLUMNY W DANYCH:")
    print("-" * 80)
    print(f"   Kolumny sygna≈Ç√≥w: {[col for col in full_data.columns if col not in ['timestamp', 'phase', 'label', 'subject']]}")
    print(f"   Liczba kolumn sygna≈Ç√≥w: {len([col for col in full_data.columns if col not in ['timestamp', 'phase', 'label', 'subject']])}")
    
    print(f"\nüìä KILKA PIERWSZYCH WIERSZY:")
    print("-" * 80)
    print(full_data.head(10))
    
    print(f"\nüìä STATYSTYKI CZASOWE:")
    print("-" * 80)
    print(f"   Najwcze≈õniejszy timestamp: {full_data['timestamp'].min()}")
    print(f"   Najp√≥≈∫niejszy timestamp: {full_data['timestamp'].max()}")
    print(f"   Czas trwania: {(full_data['timestamp'].max() - full_data['timestamp'].min()).total_seconds() / 60:.2f} minut")
    
    print(f"\n‚úÖ WCZYTYWANIE DANYCH ZAKO≈ÉCZONE POMY≈öLNIE!")
else:
    print("‚ùå Nie wczytano ≈ºadnych danych!")
    raise ValueError("Brak danych do analizy")



‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce importy!
   B≈ÇƒÖd: name 'Path' is not defined
   Najpierw uruchom KROK 1 (kom√≥rka z importami bibliotek)!


NameError: BrakujƒÖce importy - uruchom najpierw KROK 1: name 'Path' is not defined

## KROK 3: SEGMENTACJA DANYCH (SLIDING WINDOWS)

Tworzymy okna czasowe i wyciƒÖgamy statystyczne cechy z ka≈ºdego okna.


In [2]:
# ============================================================================
# KROK 3: SEGMENTACJA DANYCH (SLIDING WINDOWS)
# ============================================================================

# Sprawd≈∫ dostƒôpno≈õƒá zmiennych
required_vars = ['full_data', 'TARGET_FS']
missing_vars = [var for var in required_vars if var not in globals()]

if missing_vars:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: {', '.join(missing_vars)}")
    print("   Najpierw uruchom KROK 2!")
    raise NameError(f"BrakujƒÖce zmienne: {', '.join(missing_vars)}")

# Parametry segmentacji
WINDOW_SIZE_SECONDS = 5  # Rozmiar okna w sekundach
STEP_SIZE_SECONDS = 2.5  # Krok (50% overlap)
WINDOW_SIZE = int(WINDOW_SIZE_SECONDS * TARGET_FS)  # Rozmiar okna w pr√≥bkach
STEP_SIZE = int(STEP_SIZE_SECONDS * TARGET_FS)  # Krok w pr√≥bkach

print("=" * 80)
print("KROK 3: SEGMENTACJA DANYCH (SLIDING WINDOWS)")
print("=" * 80)
print(f"  Rozmiar okna: {WINDOW_SIZE_SECONDS} sekund ({WINDOW_SIZE} pr√≥bek)")
print(f"  Krok: {STEP_SIZE_SECONDS} sekund ({STEP_SIZE} pr√≥bek)")
print(f"  Overlap: {(1 - STEP_SIZE_SECONDS/WINDOW_SIZE_SECONDS)*100:.1f}%")

# Funkcje do ekstrakcji cech
def compute_rms(signal):
    """Oblicza RMS (Root Mean Square)"""
    return np.sqrt(np.mean(signal**2))

def compute_kurtosis(signal):
    """Oblicza kurtozƒô"""
    if len(signal) < 4:
        return 0.0
    # Sprawd≈∫ czy scipy jest dostƒôpny
    try:
        SCIPY_AVAILABLE
    except NameError:
        SCIPY_AVAILABLE = False
    
    if SCIPY_AVAILABLE:
        try:
            return stats.kurtosis(signal, nan_policy='omit')
        except:
            pass
    
    # Oblicz kurtozƒô rƒôcznie (bez scipy)
    signal_clean = signal[~np.isnan(signal)]
    if len(signal_clean) < 4:
        return 0.0
    mean = np.mean(signal_clean)
    std = np.std(signal_clean)
    if std == 0:
        return 0.0
    n = len(signal_clean)
    kurt = np.mean(((signal_clean - mean) / std) ** 4) - 3
    return kurt

def compute_skewness(signal):
    """Oblicza sko≈õno≈õƒá"""
    if len(signal) < 3:
        return 0.0
    # Sprawd≈∫ czy scipy jest dostƒôpny
    try:
        SCIPY_AVAILABLE
    except NameError:
        SCIPY_AVAILABLE = False
    
    if SCIPY_AVAILABLE:
        try:
            return stats.skew(signal, nan_policy='omit')
        except:
            pass
    
    # Oblicz sko≈õno≈õƒá rƒôcznie (bez scipy)
    signal_clean = signal[~np.isnan(signal)]
    if len(signal_clean) < 3:
        return 0.0
    mean = np.mean(signal_clean)
    std = np.std(signal_clean)
    if std == 0:
        return 0.0
    n = len(signal_clean)
    skew = np.mean(((signal_clean - mean) / std) ** 3)
    return skew

def compute_rmssd(signal):
    """Oblicza RMSSD (Root Mean Square of Successive Differences) - dla HRV"""
    if len(signal) < 2:
        return 0.0
    diff = np.diff(signal)
    return np.sqrt(np.mean(diff**2))

def compute_slope(signal):
    """Oblicza nachylenie (slope) - trend liniowy"""
    if len(signal) < 2:
        return 0.0
    x = np.arange(len(signal))
    coeffs = np.polyfit(x, signal, 1)
    return coeffs[0]

def compute_respiration_rate(signal, fs=TARGET_FS):
    """Oblicza tempo oddechu (dla sygna≈Çu respiracji)"""
    if len(signal) < int(fs * 2):  # Minimum 2 sekundy
        return 0.0
    
    # Sprawd≈∫ czy scipy jest dostƒôpny
    try:
        SCIPY_AVAILABLE
    except NameError:
        SCIPY_AVAILABLE = False
    
    # Znajd≈∫ peaki
    if SCIPY_AVAILABLE:
        try:
            from scipy.signal import find_peaks
            peaks, _ = find_peaks(signal, distance=int(fs * 0.5))  # Minimum 0.5s miƒôdzy peakami
        except:
            # Fallback do prostego algorytmu
            peaks = []
            threshold = np.mean(signal) + 0.5 * np.std(signal)
            min_distance = int(fs * 0.5)
            for i in range(min_distance, len(signal) - min_distance):
                if signal[i] > threshold and signal[i] == np.max(signal[i-min_distance:i+min_distance+1]):
                    peaks.append(i)
            peaks = np.array(peaks)
    else:
        # Prosty algorytm znajdowania peak√≥w bez scipy
        peaks = []
        threshold = np.mean(signal) + 0.5 * np.std(signal)
        min_distance = int(fs * 0.5)
        for i in range(min_distance, len(signal) - min_distance):
            if signal[i] > threshold and signal[i] == np.max(signal[i-min_distance:i+min_distance+1]):
                peaks.append(i)
        peaks = np.array(peaks)
    
    if len(peaks) < 2:
        return 0.0
    # Oblicz ≈õredni czas miƒôdzy peakami
    peak_intervals = np.diff(peaks) / fs
    avg_interval = np.mean(peak_intervals)
    if avg_interval > 0:
        return 60.0 / avg_interval  # Oddechy na minutƒô
    return 0.0

def extract_features_from_window(window_data):
    """WyciƒÖga cechy statystyczne z okna"""
    features = {}
    
    # Kolumny sygna≈Ç√≥w (pomijamy timestamp, phase, label, subject)
    signal_cols = [col for col in window_data.columns 
                   if col not in ["timestamp", "phase", "label", "subject"]]
    
    for col in signal_cols:
        signal = window_data[col].values
        signal_clean = signal[~np.isnan(signal)]
        
        if len(signal_clean) == 0:
            # Je≈õli wszystkie warto≈õci sƒÖ NaN, ustaw wszystkie cechy na 0
            features[f"{col}_mean"] = 0.0
            features[f"{col}_std"] = 0.0
            features[f"{col}_min"] = 0.0
            features[f"{col}_max"] = 0.0
            features[f"{col}_range"] = 0.0
            features[f"{col}_rms"] = 0.0
            features[f"{col}_kurtosis"] = 0.0
            features[f"{col}_skewness"] = 0.0
            features[f"{col}_rmssd"] = 0.0
            features[f"{col}_slope"] = 0.0
            continue
        
        # Podstawowe statystyki
        features[f"{col}_mean"] = np.mean(signal_clean)
        features[f"{col}_std"] = np.std(signal_clean) if len(signal_clean) > 1 else 0.0
        features[f"{col}_min"] = np.min(signal_clean)
        features[f"{col}_max"] = np.max(signal_clean)
        features[f"{col}_range"] = features[f"{col}_max"] - features[f"{col}_min"]
        
        # Zaawansowane cechy
        features[f"{col}_rms"] = compute_rms(signal_clean)
        features[f"{col}_kurtosis"] = compute_kurtosis(signal_clean)
        features[f"{col}_skewness"] = compute_skewness(signal_clean)
        features[f"{col}_rmssd"] = compute_rmssd(signal_clean)
        features[f"{col}_slope"] = compute_slope(signal_clean)
        
        # Tempo oddechu (tylko dla kolumn zwiƒÖzanych z oddechem)
        if "resp" in col.lower() or "breath" in col.lower():
            features[f"{col}_respiration_rate"] = compute_respiration_rate(signal_clean)
    
    return features

# Segmentacja sliding window
print(f"\nüîß Wykonujƒô segmentacjƒô sliding window...")

segmented_data = []
groups = []  # Dla subject-wise split

for subject in full_data["subject"].unique():
    subject_data = full_data[full_data["subject"] == subject].copy()
    subject_data = subject_data.sort_values("timestamp").reset_index(drop=True)
    
    # Segmentacja
    n_samples = len(subject_data)
    for start_idx in range(0, n_samples - WINDOW_SIZE + 1, STEP_SIZE):
        end_idx = start_idx + WINDOW_SIZE
        window = subject_data.iloc[start_idx:end_idx].copy()
        
        # WyciƒÖgnij cechy
        features = extract_features_from_window(window)
        
        # Etykieta okna (mode z okna)
        label_counts = window["label"].value_counts()
        window_label = label_counts.index[0] if len(label_counts) > 0 else "unknown"
        
        # Dodaj metadane
        features["label"] = window_label
        features["subject"] = subject
        features["window_start"] = start_idx
        features["window_end"] = end_idx
        
        segmented_data.append(features)
        groups.append(subject)

# Stw√≥rz DataFrame z segmentowanych danych
segmented_df = pd.DataFrame(segmented_data)

print(f"\n{'='*80}")
print("PODSUMOWANIE SEGMENTACJI")
print(f"{'='*80}")
print(f"‚úÖ Segmentacja zako≈Ñczona!")
print(f"   Liczba okien: {len(segmented_df)}")
print(f"   Liczba cech: {len([col for col in segmented_df.columns if col not in ['label', 'subject', 'window_start', 'window_end']])}")

# Sprawd≈∫ rozk≈Çad klas przed usuniƒôciem "unknown"
print(f"\nüìä ROZK≈ÅAD KLAS PRZED USUNIƒòCIEM 'unknown':")
print("-" * 80)
class_dist_before_clean = segmented_df["label"].value_counts()
for label in class_dist_before_clean.index:
    count = class_dist_before_clean[label]
    pct = (count / len(segmented_df) * 100) if len(segmented_df) > 0 else 0
    print(f"   {label:12s}: {count:4d} pr√≥bek ({pct:5.1f}%)")

# Usu≈Ñ okna z etykietƒÖ "unknown"
segmented_df = segmented_df[segmented_df["label"] != "unknown"].copy()
print(f"\nüìä Liczba okien po usuniƒôciu 'unknown': {len(segmented_df)}")
print(f"   Usuniƒôto: {len(segmented_data) - len(segmented_df)} okien z etykietƒÖ 'unknown'")

# Sprawd≈∫ rozk≈Çad klas przed agregacjƒÖ
print(f"\nüìä Rozk≈Çad klas PRZED agregacjƒÖ:")
class_dist_before_agg = segmented_df["label"].value_counts()
print(class_dist_before_agg)

# ‚ö†Ô∏è WA≈ªNE: Agregacja klas (amusement + stress ‚Üí emotion)
print(f"\n{'='*80}")
print("AGREGACJA KLAS: amusement + stress ‚Üí emotion")
print(f"{'='*80}")
segmented_df["label"] = segmented_df["label"].replace({
    "amusement": "emotion",
    "stress": "emotion"
})
print(f"   ‚úÖ Agregacja wykonana: amusement + stress ‚Üí emotion")

# U≈ºyj kolumny 'subject' z segmented_df jako groups (najprostsze i najbardziej niezawodne)
if 'subject' in segmented_df.columns:
    groups = segmented_df['subject'].tolist()
    print(f"   ‚úÖ Utworzono groups z kolumny 'subject': {len(groups)} element√≥w")
else:
    print(f"   ‚ö†Ô∏è OSTRZE≈ªENIE: Brak kolumny 'subject' w segmented_df!")
    raise ValueError("Brak kolumny 'subject' w segmented_df - nie mo≈ºna utworzyƒá groups")

# Sprawd≈∫ rozk≈Çad klas po agregacji
print(f"\nüìä ROZK≈ÅAD KLAS PO AGREGACJI:")
print("-" * 80)
class_dist_seg = segmented_df["label"].value_counts()
for label in class_dist_seg.index:
    count = class_dist_seg[label]
    pct = (count / len(segmented_df) * 100) if len(segmented_df) > 0 else 0
    print(f"   {label:12s}: {count:4d} pr√≥bek ({pct:5.1f}%)")

# Sprawd≈∫ rozk≈Çad klas per subject po agregacji
print(f"\nüìä ROZK≈ÅAD KLAS PER SUBJECT PO AGREGACJI:")
print("-" * 80)
for subject in segmented_df["subject"].unique():
    subject_data = segmented_df[segmented_df["subject"] == subject]
    print(f"\n  {subject}:")
    label_dist = subject_data["label"].value_counts()
    for label in label_dist.index:
        count = label_dist[label]
        pct = (count / len(subject_data) * 100) if len(subject_data) > 0 else 0
        print(f"    {label:12s}: {count:4d} pr√≥bek ({pct:5.1f}%)")

# Weryfikacja: czy mamy obie klasy (baseline i emotion)
unique_labels = segmented_df["label"].unique()
print(f"\n{'='*80}")
print("WERYFIKACJA KLAS PO SEGMENTACJI I AGREGACJI")
print(f"{'='*80}")
print(f"   Unikalne klasy: {unique_labels}")
print(f"   Liczba klas: {len(unique_labels)}")

if len(unique_labels) < 2:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: Tylko {len(unique_labels)} klas po segmentacji!")
    print(f"   Musimy mieƒá co najmniej 2 klasy (baseline i emotion) dla SMOTE!")
    print(f"   Sprawd≈∫ wczytywanie faz i mapowanie PHASE_TO_CLASS.")
    print(f"\nüìä DIAGNOSTYKA:")
    print(f"   - Sprawd≈∫ czy fazy sƒÖ poprawnie wczytywane z plik√≥w *_quest.csv")
    print(f"   - Sprawd≈∫ czy mapowanie PHASE_TO_CLASS jest poprawne")
    print(f"   - Sprawd≈∫ czy assign_phase_labels dzia≈Ça poprawnie")
else:
    print(f"   ‚úÖ Mamy {len(unique_labels)} klas - OK dla SMOTE")
    print(f"   ‚úÖ Klasy: {', '.join(unique_labels)}")
    
    # Sprawd≈∫ balance ratio przed SMOTE
    if len(unique_labels) == 2:
        counts = [class_dist_seg[label] for label in unique_labels]
        balance_ratio = min(counts) / max(counts) if max(counts) > 0 else 0
        print(f"   üìä Balance ratio (przed SMOTE): {balance_ratio:.4f}")
        if balance_ratio < 0.5:
            print(f"   ‚ö†Ô∏è OSTRZE≈ªENIE: Silna nier√≥wnowaga klas (balance ratio < 0.5)")
            print(f"      SMOTE bƒôdzie musia≈Ç wygenerowaƒá du≈ºo syntetycznych pr√≥bek")



‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: full_data, TARGET_FS
   Najpierw uruchom KROK 2!


NameError: BrakujƒÖce zmienne: full_data, TARGET_FS

## KROK 4: ENCODING I SKALOWANIE CECH

Kodujemy etykiety i skalujemy cechy.


In [4]:
# ============================================================================
# KROK 4: ENCODING I SKALOWANIE CECH
# ============================================================================

# Sprawd≈∫ dostƒôpno≈õƒá zmiennych
required_vars = ['segmented_df']
missing_vars = [var for var in required_vars if var not in globals()]

if missing_vars:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: {', '.join(missing_vars)}")
    print("   Najpierw uruchom KROK 3!")
    raise NameError(f"BrakujƒÖce zmienne: {', '.join(missing_vars)}")

print("=" * 80)
print("KROK 4: ENCODING I SKALOWANIE CECH")
print("=" * 80)

# Przygotuj dane
feature_cols = [col for col in segmented_df.columns 
                if col not in ["label", "subject", "window_start", "window_end"]]
X = segmented_df[feature_cols].copy()
y = segmented_df["label"].copy()

# Usu≈Ñ kolumny z samymi NaN
X = X.dropna(axis=1, how='all')

# Wype≈Çnij pozosta≈Çe NaN zerami
X = X.fillna(0.0)

print(f"\nüìä Kszta≈Çt danych:")
print(f"   X: {X.shape}")
print(f"   y: {len(y)} pr√≥bek")

# LabelEncoder dla targetu
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

print(f"\n‚úÖ LabelEncoder:")
print(f"   Klasy: {label_encoder.classes_}")
print(f"   Kodowanie: {dict(zip(label_encoder.classes_, range(len(label_encoder.classes_))))}")

# StandardScaler dla cech
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled = pd.DataFrame(X_scaled, columns=X.columns, index=X.index)

print(f"\n‚úÖ StandardScaler:")
print(f"   Cechy przeskalowane: {X_scaled.shape[1]}")
print(f"   Przyk≈Çadowe warto≈õci (pierwsze 5 cech, pierwsze 3 pr√≥bki):")
print(X_scaled.iloc[:3, :5])



‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: segmented_df
   Najpierw uruchom KROK 3!


NameError: BrakujƒÖce zmienne: segmented_df

## KROK 5: PODZIA≈Å TRAIN/TEST - SUBJECT-WISE SPLIT

‚ö†Ô∏è **WA≈ªNE**: Ca≈Çe dane jednej osoby trafiajƒÖ albo do train, albo do test. Nie dzielimy okien z tej samej osoby miƒôdzy train i test.


In [5]:
# ============================================================================
# KROK 5: PODZIA≈Å TRAIN/TEST - SUBJECT-WISE SPLIT
# ============================================================================

# Sprawd≈∫ dostƒôpno≈õƒá zmiennych
required_vars = ['X_scaled', 'y_encoded', 'groups']
missing_vars = [var for var in required_vars if var not in globals()]

if missing_vars:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: {', '.join(missing_vars)}")
    print("   Najpierw uruchom KROK 3 i KROK 4!")
    raise NameError(f"BrakujƒÖce zmienne: {', '.join(missing_vars)}")

print("=" * 80)
print("KROK 5: PODZIA≈Å TRAIN/TEST - SUBJECT-WISE SPLIT")
print("=" * 80)
print("‚ö†Ô∏è WA≈ªNE: Ca≈Çe dane jednej osoby trafiajƒÖ albo do train, albo do test")
print("‚ö†Ô∏è WA≈ªNE: Nie dzielimy okien z tej samej osoby miƒôdzy train i test")

# Sprawd≈∫ d≈Çugo≈õci i konwertuj groups na numpy array
print(f"\nüìä Weryfikacja danych przed split:")
print(f"   X_scaled shape: {X_scaled.shape}")
print(f"   y_encoded length: {len(y_encoded)}")
print(f"   groups length: {len(groups)}")

if len(groups) != len(X_scaled) or len(groups) != len(y_encoded):
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: D≈Çugo≈õci siƒô nie zgadzajƒÖ!")
    print(f"   X_scaled: {len(X_scaled)}, y_encoded: {len(y_encoded)}, groups: {len(groups)}")
    raise ValueError("D≈Çugo≈õci X_scaled, y_encoded i groups muszƒÖ byƒá identyczne!")

# Konwertuj groups na numpy array (wymagane przez GroupShuffleSplit)
groups_array = np.array(groups)
print(f"   ‚úÖ groups skonwertowany na numpy array: {groups_array.shape}")

# Sprawd≈∫ rozk≈Çad klas per subject (przed split)
print(f"\nüìä ROZK≈ÅAD KLAS PER SUBJECT (przed split):")
print("-" * 80)
unique_subjects = np.unique(groups_array)
subject_class_dist = {}
for subject in unique_subjects:
    subject_mask = groups_array == subject
    subject_labels = y_encoded[subject_mask]
    subject_dist = pd.Series(label_encoder.inverse_transform(subject_labels)).value_counts()
    subject_class_dist[subject] = subject_dist
    print(f"\n  {subject}:")
    for label in subject_dist.index:
        print(f"    {label:12s}: {subject_dist[label]:4d} pr√≥bek")

# STRATIFIED SUBJECT-WISE SPLIT: Upewnij siƒô, ≈ºe w train i test sƒÖ obecne obie klasy
print(f"\n{'='*80}")
print("STRATIFIED SUBJECT-WISE SPLIT")
print(f"{'='*80}")
print("‚ö†Ô∏è WA≈ªNE: Upewniamy siƒô, ≈ºe w train i test sƒÖ obecne obie klasy!")
print("‚ö†Ô∏è WA≈ªNE: W train muszƒÖ byƒá co najmniej 2 klasy (wymagane dla SMOTE)!")

# Sprawd≈∫ rozk≈Çad klas per subject (przed split)
print(f"\nüìä ROZK≈ÅAD KLAS PER SUBJECT (przed split):")
print("-" * 80)
unique_subjects = np.unique(groups_array)
subject_class_dist = {}
for subject in unique_subjects:
    subject_mask = groups_array == subject
    subject_labels = y_encoded[subject_mask]
    subject_dist = pd.Series(label_encoder.inverse_transform(subject_labels)).value_counts()
    subject_class_dist[subject] = subject_dist
    print(f"\n  {subject}:")
    for label in subject_dist.index:
        print(f"    {label:12s}: {subject_dist[label]:4d} pr√≥bek")

# U≈ºyj GroupShuffleSplit z wieloma pr√≥bami, aby znale≈∫ƒá podzia≈Ç z obiema klasami w train i test
gss = GroupShuffleSplit(n_splits=100, test_size=0.2, random_state=42)

best_train_idx = None
best_test_idx = None
best_score = -np.inf
best_train_subjects = None
best_test_subjects = None

print(f"\nüîç Szukam najlepszego podzia≈Çu (testujƒô {gss.n_splits} r√≥≈ºnych podzia≈Ç√≥w)...")

for train_idx, test_idx in gss.split(X_scaled, y_encoded, groups=groups_array):
    # Sprawd≈∫ czy w train i test sƒÖ obecne obie klasy
    train_classes = np.unique(y_encoded[train_idx])
    test_classes = np.unique(y_encoded[test_idx])
    
    # Wszystkie klasy muszƒÖ byƒá w train (wymagane dla SMOTE)
    # W test powinna byƒá co najmniej jedna klasa (ale najlepiej obie)
    if len(train_classes) >= 2:
        # Sprawd≈∫ subjecty w train i test
        train_subjects_set = set(groups_array[train_idx])
        test_subjects_set = set(groups_array[test_idx])
        
        # Oblicz "score" - preferuj podzia≈Çy z obiema klasami w test i wiƒôcej subject√≥w
        score = len(train_classes) * 10 + len(test_classes) * 5 + len(train_subjects_set) + len(test_subjects_set)
        if score > best_score:
            best_score = score
            best_train_idx = train_idx
            best_test_idx = test_idx
            best_train_subjects = train_subjects_set
            best_test_subjects = test_subjects_set

# Je≈õli nie znaleziono podzia≈Çu z obiema klasami w train, rzuƒá b≈ÇƒÖd
if best_train_idx is None:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: Nie znaleziono podzia≈Çu z co najmniej 2 klasami w train!")
    print(f"   To oznacza, ≈ºe dane sƒÖ zbyt niezbalansowane lub subjecty majƒÖ tylko jednƒÖ klasƒô.")
    print(f"   Sprawd≈∫ rozk≈Çad klas per subject powy≈ºej.")
    raise ValueError("Nie mo≈ºna utworzyƒá podzia≈Çu z co najmniej 2 klasami w train - SMOTE nie bƒôdzie dzia≈Çaƒá!")

train_idx = best_train_idx
test_idx = best_test_idx

print(f"\n‚úÖ Znaleziono najlepszy podzia≈Ç:")
print(f"   Score: {best_score}")
print(f"   Train subjects: {sorted(best_train_subjects)} ({len(best_train_subjects)} subject√≥w)")
print(f"   Test subjects: {sorted(best_test_subjects)} ({len(best_test_subjects)} subject√≥w)")

X_train = X_scaled.iloc[train_idx].copy()
X_test = X_scaled.iloc[test_idx].copy()
y_train = y_encoded[train_idx]
y_test = y_encoded[test_idx]
groups_train = [groups[i] for i in train_idx]
groups_test = [groups[i] for i in test_idx]

# Sprawd≈∫ kt√≥re subjecty trafi≈Çy do train/test
train_subjects = set(groups_train)
test_subjects = set(groups_test)

print(f"\n‚úÖ Podzia≈Ç subject-wise:")
print(f"   Train subjects: {sorted(train_subjects)} ({len(train_subjects)} subject√≥w)")
print(f"   Test subjects: {sorted(test_subjects)} ({len(test_subjects)} subject√≥w)")

# Weryfikacja: train i test subjects sƒÖ roz≈ÇƒÖczne
if train_subjects & test_subjects:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: Train i test subjects siƒô nak≈ÇadajƒÖ!")
    raise ValueError("Subject-wise split nie dzia≈Ça poprawnie!")
else:
    print(f"   ‚úÖ Train i test subjects sƒÖ roz≈ÇƒÖczne - OK")

# Sprawd≈∫ rozk≈Çad klas w train i test
print(f"\nüìä Rozk≈Çad klas w TRAIN:")
train_dist = pd.Series(label_encoder.inverse_transform(y_train)).value_counts()
for label in train_dist.index:
    count = train_dist[label]
    pct = (count / len(y_train) * 100) if len(y_train) > 0 else 0
    print(f"   {label:12s}: {count:4d} pr√≥bek ({pct:5.1f}%)")

print(f"\nüìä Rozk≈Çad klas w TEST:")
test_dist = pd.Series(label_encoder.inverse_transform(y_test)).value_counts()
for label in test_dist.index:
    count = test_dist[label]
    pct = (count / len(y_test) * 100) if len(y_test) > 0 else 0
    print(f"   {label:12s}: {count:4d} pr√≥bek ({pct:5.1f}%)")

# Sprawdzenie liczby pr√≥bek per klasƒô (u≈ºywajƒÖc kodu u≈ºytkownika)
print(f"\nüìä SZCZEG√ì≈ÅOWE SPRAWDZENIE LICZBY PR√ìBEK PER KLASƒò:")
print("-" * 80)
print("TRAIN:")
for label in np.unique(y_train):
    label_name = label_encoder.inverse_transform([label])[0]
    count = np.sum(y_train == label)
    print(f"   {label_name:12s} (kod {label}): {count:4d} pr√≥bek")

print("\nTEST:")
for label in np.unique(y_test):
    label_name = label_encoder.inverse_transform([label])[0]
    count = np.sum(y_test == label)
    print(f"   {label_name:12s} (kod {label}): {count:4d} pr√≥bek")

# Weryfikacja: czy w train sƒÖ co najmniej 2 klasy (wymagane dla SMOTE)
train_unique_classes = np.unique(y_train)
test_unique_classes = np.unique(y_test)

print(f"\n‚úÖ WERYFIKACJA KLAS:")
print(f"   Train: {len(train_unique_classes)} klas - {label_encoder.inverse_transform(train_unique_classes)}")
print(f"   Test: {len(test_unique_classes)} klas - {label_encoder.inverse_transform(test_unique_classes)}")

if len(train_unique_classes) < 2:
    print(f"\n‚ö†Ô∏è OSTRZE≈ªENIE: Tylko {len(train_unique_classes)} klas w train!")
    print(f"   SMOTE wymaga co najmniej 2 klas. Balansowanie mo≈ºe nie dzia≈Çaƒá.")
else:
    print(f"   ‚úÖ Train ma co najmniej 2 klasy - SMOTE mo≈ºe dzia≈Çaƒá")

print(f"\n‚úÖ Podzia≈Ç zako≈Ñczony:")
print(f"   Train: {len(X_train)} pr√≥bek")
print(f"   Test: {len(X_test)} pr√≥bek")



‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: X_scaled, y_encoded, groups
   Najpierw uruchom KROK 3 i KROK 4!


NameError: BrakujƒÖce zmienne: X_scaled, y_encoded, groups

## KROK 6: BALANSOWANIE DANYCH W TRENINGU (SMOTE)

Zastosujemy SMOTE **TYLKO na train**. Test pozostaje niezmieniony.


In [None]:
# ============================================================================
# KROK 6: BALANSOWANIE DANYCH W TRENINGU (SMOTE)
# ============================================================================

print("=" * 80)
print("KROK 6: BALANSOWANIE DANYCH W TRENINGU (SMOTE)")
print("=" * 80)
print("‚ö†Ô∏è WA≈ªNE: SMOTE TYLKO na train, test pozostaje niezmieniony!")

# Sprawd≈∫ dostƒôpno≈õƒá zmiennych
required_vars = ['X_train', 'y_train', 'y_test', 'label_encoder']
missing_vars = [var for var in required_vars if var not in globals()]

if missing_vars:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: {', '.join(missing_vars)}")
    print("   Najpierw uruchom KROK 4 i KROK 5!")
    raise NameError(f"BrakujƒÖce zmienne: {', '.join(missing_vars)}")

# Sprawd≈∫ rozk≈Çad klas przed SMOTE
print(f"\nüìä Rozk≈Çad klas PRZED SMOTE (train):")
train_dist_before = pd.Series(label_encoder.inverse_transform(y_train)).value_counts()
for label in train_dist_before.index:
    count = train_dist_before[label]
    pct = (count / len(y_train) * 100) if len(y_train) > 0 else 0
    print(f"   {label:12s}: {count:4d} pr√≥bek ({pct:5.1f}%)")

# Sprawd≈∫ czy mamy wiƒôcej ni≈º jednƒÖ klasƒô (SMOTE wymaga co najmniej 2 klas)
unique_classes = np.unique(y_train)
n_classes = len(unique_classes)

if n_classes < 2:
    print(f"\n‚ö†Ô∏è OSTRZE≈ªENIE: Tylko {n_classes} klas w train - SMOTE nie mo≈ºe dzia≈Çaƒá!")
    print(f"   Klasy w train: {label_encoder.inverse_transform(unique_classes)}")
    print(f"   SMOTE wymaga co najmniej 2 klas. U≈ºywam danych bez balansowania.")
    X_train_bal = X_train.copy() if isinstance(X_train, pd.DataFrame) else X_train
    y_train_bal = y_train.copy()
elif not IMBLEARN_AVAILABLE:
    print("\n‚ùå‚ùå‚ùå B≈ÅƒÑD: imbalanced-learn nie jest dostƒôpny!")
    print("   Zainstaluj: pip install imbalanced-learn")
    X_train_bal = X_train.copy() if isinstance(X_train, pd.DataFrame) else X_train
    y_train_bal = y_train.copy()
else:
    # Konwertuj X_train na numpy array je≈õli jest DataFrame
    if isinstance(X_train, pd.DataFrame):
        X_train_array = X_train.values
    else:
        X_train_array = X_train
    
    # SMOTE wymaga co najmniej 2 klas - to ju≈º sprawdzili≈õmy wcze≈õniej
    if n_classes >= 2:
        # Zastosuj SMOTE
        print(f"\nüîß Wykonujƒô SMOTE...")
        print(f"   Liczba klas: {n_classes}")
        print(f"   Klasy: {label_encoder.inverse_transform(unique_classes)}")
        
        # Upewnij siƒô, ≈ºe X_train jest numpy array (SMOTE wymaga numpy array)
        X_train_for_smote = X_train_array
        
        try:
            # U≈ºyj prostego SMOTE (zgodnie z przyk≈Çadem u≈ºytkownika)
            from imblearn.over_sampling import SMOTE
            smote = SMOTE(random_state=42)
            X_train_bal_array, y_train_bal = smote.fit_resample(X_train_for_smote, y_train)
            
            print(f"‚úÖ SMOTE.fit_resample() wykonany pomy≈õlnie!")
            
            # Konwertuj z powrotem na DataFrame je≈õli X_train by≈Ç DataFrame
            if isinstance(X_train, pd.DataFrame):
                X_train_bal = pd.DataFrame(X_train_bal_array, columns=X_train.columns)
            else:
                X_train_bal = X_train_bal_array
            
            # Weryfikacja SMOTE - sprawd≈∫ rozk≈Çad klas po SMOTE
            print(f"\nüìä WERYFIKACJA SMOTE:")
            print(f"   Train przed SMOTE: {len(X_train)} pr√≥bek")
            print(f"   Train po SMOTE: {len(X_train_bal)} pr√≥bek")
            
            # Sprawd≈∫ rozk≈Çad klas po SMOTE (u≈ºywajƒÖc kodu u≈ºytkownika)
            print(f"\nüìä ROZK≈ÅAD KLAS PO SMOTE (train) - SZCZEG√ì≈ÅOWO:")
            print("-" * 80)
            for label in np.unique(y_train_bal):
                label_name = label_encoder.inverse_transform([label])[0]
                count = np.sum(y_train_bal == label)
                pct = (count / len(y_train_bal) * 100) if len(y_train_bal) > 0 else 0
                print(f"   {label_name:12s} (kod {label}): {count:4d} pr√≥bek ({pct:5.1f}%)")
            
            # Weryfikacja balansu
            unique_labels_after = np.unique(y_train_bal)
            if len(unique_labels_after) == 2:
                counts = [np.sum(y_train_bal == label) for label in unique_labels_after]
                balance_ratio = min(counts) / max(counts)
                print(f"\nüìä Balance ratio: {balance_ratio:.4f}")
                if balance_ratio >= 0.95:
                    print(f"   ‚úÖ‚úÖ‚úÖ IDEALNY BALANS! Klasy sƒÖ zbalansowane (balance ratio >= 0.95)!")
                elif balance_ratio >= 0.8:
                    print(f"   ‚úÖ Dobry balans (balance ratio >= 0.8)")
                else:
                    print(f"   ‚ö†Ô∏è Czƒô≈õciowy balans (balance ratio < 0.8)")
            
            print(f"\n‚úÖ SMOTE zako≈Ñczony pomy≈õlnie!")
        except Exception as e:
            print(f"\n‚ùå B≈ÅƒÑD podczas SMOTE: {e}")
            print(f"   U≈ºywam danych bez balansowania.")
            X_train_bal = X_train.copy() if isinstance(X_train, pd.DataFrame) else X_train
            y_train_bal = y_train.copy()
    
# Sprawd≈∫ rozk≈Çad klas po SMOTE (lub bez SMOTE je≈õli nie by≈Ço mo≈ºliwe)
print(f"\nüìä Rozk≈Çad klas PO SMOTE (train):")
train_dist_after = pd.Series(label_encoder.inverse_transform(y_train_bal)).value_counts()
for label in train_dist_after.index:
    count = train_dist_after[label]
    pct = (count / len(y_train_bal) * 100) if len(y_train_bal) > 0 else 0
    print(f"   {label:12s}: {count:4d} pr√≥bek ({pct:5.1f}%)")

# Sprawd≈∫ balance ratio (tylko je≈õli mamy wiƒôcej ni≈º jednƒÖ klasƒô)
if len(train_dist_after) >= 2:
    balance_ratio = min(train_dist_after.values) / max(train_dist_after.values)
    print(f"\nüìä Balance ratio: {balance_ratio:.4f}")
    
    baseline_count = train_dist_after.get('baseline', 0)
    emotion_count = train_dist_after.get('emotion', 0) if 'emotion' in train_dist_after.index else 0
    stress_count = train_dist_after.get('stress', 0) if 'stress' in train_dist_after.index else 0
    amusement_count = train_dist_after.get('amusement', 0) if 'amusement' in train_dist_after.index else 0
    
    if baseline_count > 0 and (emotion_count > 0 or stress_count > 0 or amusement_count > 0):
        minority_count = emotion_count + stress_count + amusement_count
        if baseline_count == minority_count:
            print(f"   ‚úÖ‚úÖ‚úÖ IDEALNY BALANS! baseline i emotion/stress/amusement majƒÖ tyle samo pr√≥bek!")
        elif balance_ratio >= 0.95:
            print(f"   ‚úÖ‚úÖ‚úÖ KLASY SƒÑ ZBALANSOWANE (balance ratio >= 0.95)!")
        else:
            print(f"   ‚ö†Ô∏è Klasy sƒÖ czƒô≈õciowo zbalansowane")
elif len(train_dist_after) == 1:
    print(f"\n‚ö†Ô∏è OSTRZE≈ªENIE: Tylko jedna klasa w train po SMOTE - balansowanie nie by≈Ço mo≈ºliwe")

# Weryfikacja: test pozostaje niezmieniony
print(f"\nüìä Rozk≈Çad klas w TEST (niezmieniony, bez SMOTE):")
test_dist_unchanged = pd.Series(label_encoder.inverse_transform(y_test)).value_counts()
for label in test_dist_unchanged.index:
    count = test_dist_unchanged[label]
    pct = (count / len(y_test) * 100) if len(y_test) > 0 else 0
    print(f"   {label:12s}: {count:4d} pr√≥bek ({pct:5.1f}%)")

print(f"\n‚úÖ Test pozostaje niezbalansowany - to jest poprawne dla realnej ewaluacji!")

# Weryfikacja ko≈Ñcowa
print(f"\nüìä PODSUMOWANIE:")
print(f"   Train przed SMOTE: {len(X_train)} pr√≥bek, {len(np.unique(y_train))} klas")
print(f"   Train po SMOTE: {len(X_train_bal)} pr√≥bek, {len(np.unique(y_train_bal))} klas")
print(f"   Test: {len(y_test)} pr√≥bek, {len(np.unique(y_test))} klas")

KROK 6: BALANSOWANIE DANYCH W TRENINGU (SMOTE)
‚ö†Ô∏è WA≈ªNE: SMOTE TYLKO na train, test pozostaje niezmieniony!

üìä Rozk≈Çad klas PRZED SMOTE (train):
   baseline    :   54 pr√≥bek ( 65.9%)
   emotion     :   28 pr√≥bek ( 34.1%)

‚ùå‚ùå‚ùå B≈ÅƒÑD: imbalanced-learn nie jest dostƒôpny!
   Zainstaluj: pip install imbalanced-learn

üìä Rozk≈Çad klas PO SMOTE (train):
   baseline    :   54 pr√≥bek ( 65.9%)
   emotion     :   28 pr√≥bek ( 34.1%)

üìä Balance ratio: 0.5185
   ‚ö†Ô∏è Klasy sƒÖ czƒô≈õciowo zbalansowane

üìä Rozk≈Çad klas w TEST (niezmieniony, bez SMOTE):
   baseline    :   26 pr√≥bek ( 63.4%)
   emotion     :   15 pr√≥bek ( 36.6%)

‚úÖ Test pozostaje niezbalansowany - to jest poprawne dla realnej ewaluacji!

üìä PODSUMOWANIE:
   Train przed SMOTE: 82 pr√≥bek, 2 klas
   Train po SMOTE: 82 pr√≥bek, 2 klas
   Test: 41 pr√≥bek, 2 klas


## KROK 7: PRZYGOTOWANIE DANYCH DLA TIME SERIES

Przekszta≈Çcamy dane z okien w sekwencje czasowe dla modelu LSTM/GRU.

Ka≈ºda sekwencja sk≈Çada siƒô z N kolejnych okien (timesteps), gdzie ka≈ºde okno to wektor cech.


In [None]:
# ============================================================================
# KROK 7: PRZYGOTOWANIE DANYCH DLA TIME SERIES
# ============================================================================

print("=" * 80)
print("KROK 7: PRZYGOTOWANIE DANYCH DLA TIME SERIES")
print("=" * 80)

# Sprawd≈∫ dostƒôpno≈õƒá zmiennych
required_vars = ['X_train_bal', 'y_train_bal', 'X_test', 'y_test', 'groups_train', 'groups_test', 'label_encoder']
missing_vars = [var for var in required_vars if var not in globals()]

if missing_vars:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: {', '.join(missing_vars)}")
    print("   Najpierw uruchom KROKI 2-6 w kolejno≈õci!")
    raise NameError(f"BrakujƒÖce zmienne: {', '.join(missing_vars)}")

# Sprawd≈∫ dostƒôpno≈õƒá TensorFlow
print(f"\nüîç Sprawdzam dostƒôpno≈õƒá TensorFlow...")
try:
    tf_check = TENSORFLOW_AVAILABLE
    print(f"   TENSORFLOW_AVAILABLE = {tf_check}")
except NameError:
    TENSORFLOW_AVAILABLE = False
    print(f"   ‚ö†Ô∏è TENSORFLOW_AVAILABLE nie jest zdefiniowane - ustawiam na False")
    print(f"   üí° Uruchom ponownie KROK 1 po zmianie SKIP_TENSORFLOW = False!")

if not TENSORFLOW_AVAILABLE:
    print("\n" + "="*80)
    print("‚ùå‚ùå‚ùå B≈ÅƒÑD: TensorFlow/Keras nie jest dostƒôpny!")
    print("="*80)
    print("\nüìã INSTRUKCJA:")
    print("   1. Wr√≥ƒá do kom√≥rki KROK 1 (Cell 1)")
    print("   2. Znajd≈∫ liniƒô: SKIP_TENSORFLOW = True")
    print("   3. Zmie≈Ñ na: SKIP_TENSORFLOW = False")
    print("   4. Uruchom ponownie kom√≥rkƒô KROK 1")
    print("\n   Je≈õli TensorFlow powoduje crash kernela:")
    print("   - Zainstaluj TensorFlow: pip install tensorflow")
    print("   - Lub u≈ºyj TensorFlow CPU: pip install tensorflow-cpu")
    print("   - Sprawd≈∫ logi Jupytera dla szczeg√≥≈Ç√≥w b≈Çƒôdu")
    print("\n" + "="*80)
    raise ImportError("TensorFlow/Keras nie jest dostƒôpny - zmie≈Ñ SKIP_TENSORFLOW = False w KROK 1")

# Parametry sekwencji
SEQUENCE_LENGTH = 5  # Liczba kolejnych okien w sekwencji (timesteps)

print(f"\nüìä PARAMETRY SEKWENCJI:")
print(f"   D≈Çugo≈õƒá sekwencji (timesteps): {SEQUENCE_LENGTH}")

# Sprawd≈∫ wymiar cech
if hasattr(X_train_bal, 'shape'):
    n_features = X_train_bal.shape[1]
    print(f"   Ka≈ºde okno = wektor cech o wymiarze: {n_features}")
else:
    print(f"   ‚ö†Ô∏è Nie mo≈ºna okre≈õliƒá wymiaru cech")

def create_sequences(X_data, y_data, groups_data, sequence_length=5):
    """
    Tworzy sekwencje czasowe z okien.
    
    Args:
        X_data: DataFrame lub array z cechami (n_samples, n_features)
        y_data: array z etykietami (n_samples,)
        groups_data: lista z identyfikatorami subject√≥w (n_samples,)
        sequence_length: d≈Çugo≈õƒá sekwencji (liczba okien)
    
    Returns:
        X_sequences: array (n_sequences, sequence_length, n_features)
        y_sequences: array (n_sequences,) - etykieta ostatniego okna w sekwencji
        groups_sequences: lista (n_sequences,) - subject dla ka≈ºdej sekwencji
    """
    # Konwertuj na numpy array
    if isinstance(X_data, pd.DataFrame):
        X_array = X_data.values
    else:
        X_array = np.array(X_data)
    
    # Konwertuj y_data na numpy array
    if isinstance(y_data, (pd.Series, pd.DataFrame)):
        y_array = y_data.values
    elif isinstance(y_data, np.ndarray):
        y_array = y_data
    else:
        y_array = np.array(y_data)
    
    X_sequences = []
    y_sequences = []
    groups_sequences = []
    
    # Grupuj dane per subject (aby tworzyƒá sekwencje tylko z okien tego samego subjecta)
    # Konwertuj groups_data na listƒô je≈õli to numpy array
    if isinstance(groups_data, np.ndarray):
        groups_list = groups_data.tolist()
    elif isinstance(groups_data, pd.Series):
        groups_list = groups_data.tolist()
    else:
        groups_list = list(groups_data)
    
    # Sprawd≈∫ d≈Çugo≈õci
    if len(X_array) != len(y_array) or len(X_array) != len(groups_list):
        raise ValueError(f"Niezgodno≈õƒá d≈Çugo≈õci: X_data={len(X_array)}, y_data={len(y_array)}, groups_data={len(groups_list)}")
    
    unique_groups = list(set(groups_list))
    
    for group in unique_groups:
        # Znajd≈∫ indeksy dla tego subjecta
        group_indices = [i for i, g in enumerate(groups_list) if g == group]
        
        if len(group_indices) < sequence_length:
            # Za ma≈Ço okien dla tego subjecta - pomi≈Ñ
            continue
        
        # Sortuj indeksy (aby zachowaƒá kolejno≈õƒá czasowƒÖ)
        group_indices = sorted(group_indices)
        
        # Tw√≥rz sekwencje z kolejnych okien
        for i in range(len(group_indices) - sequence_length + 1):
            seq_indices = group_indices[i:i + sequence_length]
            
            # Sprawd≈∫ czy mamy wystarczajƒÖcƒÖ liczbƒô indeks√≥w
            if len(seq_indices) < sequence_length:
                continue
            
            # Konwertuj seq_indices na numpy array dla indeksowania
            seq_indices_array = np.array(seq_indices)
            
            # Sprawd≈∫ czy wszystkie okna w sekwencji majƒÖ tƒô samƒÖ etykietƒô
            # (opcjonalnie - mo≈ºemy te≈º u≈ºyƒá etykiety ostatniego okna)
            seq_labels = y_array[seq_indices_array]
            # U≈ºyj etykiety ostatniego okna w sekwencji
            seq_label = seq_labels[-1]
            
            # Sprawd≈∫ czy wszystkie okna majƒÖ tƒô samƒÖ etykietƒô (opcjonalna walidacja)
            if len(np.unique(seq_labels)) > 1:
                # Sekwencja zawiera okna z r√≥≈ºnymi etykietami - mo≈ºemy jƒÖ pominƒÖƒá lub u≈ºyƒá
                # Dla uproszczenia u≈ºywamy etykiety ostatniego okna
                pass
            
            # Utw√≥rz sekwencjƒô cech
            X_seq = X_array[seq_indices_array]  # (sequence_length, n_features)
            X_sequences.append(X_seq)
            y_sequences.append(seq_label)
            groups_sequences.append(group)
    
    if len(X_sequences) == 0:
        raise ValueError("Nie utworzono ≈ºadnych sekwencji! Sprawd≈∫ dane wej≈õciowe i sequence_length.")
    
    return np.array(X_sequences), np.array(y_sequences), groups_sequences

# Tworzenie sekwencji dla train
print(f"\nüîß Tworzenie sekwencji dla TRAIN...")
print(f"   Sprawdzam dane wej≈õciowe:")
print(f"   - X_train_bal type: {type(X_train_bal)}, shape: {X_train_bal.shape if hasattr(X_train_bal, 'shape') else 'N/A'}")
print(f"   - y_train_bal type: {type(y_train_bal)}, length: {len(y_train_bal) if hasattr(y_train_bal, '__len__') else 'N/A'}")
print(f"   - groups_train type: {type(groups_train)}, length: {len(groups_train) if hasattr(groups_train, '__len__') else 'N/A'}")

try:
    X_train_seq, y_train_seq, groups_train_seq = create_sequences(
        X_train_bal, y_train_bal, groups_train, SEQUENCE_LENGTH
    )
except Exception as e:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD podczas tworzenia sekwencji TRAIN:")
    print(f"   B≈ÇƒÖd: {type(e).__name__}: {e}")
    import traceback
    traceback.print_exc()
    raise

print(f"‚úÖ Sekwencje TRAIN utworzone:")
print(f"   Kszta≈Çt X_train_seq: {X_train_seq.shape} (n_sequences, timesteps, n_features)")
print(f"   Kszta≈Çt y_train_seq: {y_train_seq.shape}")

# Tworzenie sekwencji dla test
print(f"\nüîß Tworzenie sekwencji dla TEST...")
print(f"   Sprawdzam dane wej≈õciowe:")
print(f"   - X_test type: {type(X_test)}, shape: {X_test.shape if hasattr(X_test, 'shape') else 'N/A'}")
print(f"   - y_test type: {type(y_test)}, length: {len(y_test) if hasattr(y_test, '__len__') else 'N/A'}")
print(f"   - groups_test type: {type(groups_test)}, length: {len(groups_test) if hasattr(groups_test, '__len__') else 'N/A'}")

try:
    X_test_seq, y_test_seq, groups_test_seq = create_sequences(
        X_test, y_test, groups_test, SEQUENCE_LENGTH
    )
except Exception as e:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD podczas tworzenia sekwencji TEST:")
    print(f"   B≈ÇƒÖd: {type(e).__name__}: {e}")
    import traceback
    traceback.print_exc()
    raise

print(f"‚úÖ Sekwencje TEST utworzone:")
print(f"   Kszta≈Çt X_test_seq: {X_test_seq.shape} (n_sequences, timesteps, n_features)")
print(f"   Kszta≈Çt y_test_seq: {y_test_seq.shape}")

# Sprawd≈∫ rozk≈Çad klas w sekwencjach
print(f"\nüìä ROZK≈ÅAD KLAS W SEKWENCJACH:")
print("-" * 80)
print("TRAIN:")
train_seq_dist = pd.Series(label_encoder.inverse_transform(y_train_seq)).value_counts()
for label in train_seq_dist.index:
    count = train_seq_dist[label]
    pct = (count / len(y_train_seq) * 100) if len(y_train_seq) > 0 else 0
    print(f"   {label:12s}: {count:4d} sekwencji ({pct:5.1f}%)")

print("\nTEST:")
test_seq_dist = pd.Series(label_encoder.inverse_transform(y_test_seq)).value_counts()
for label in test_seq_dist.index:
    count = test_seq_dist[label]
    pct = (count / len(y_test_seq) * 100) if len(y_test_seq) > 0 else 0
    print(f"   {label:12s}: {count:4d} sekwencji ({pct:5.1f}%)")

# Konwersja etykiet na kategorie (one-hot encoding dla Keras)
print(f"\nüîß Konwersja etykiet na kategorie (one-hot encoding)...")
n_classes = len(label_encoder.classes_)
print(f"   Liczba klas: {n_classes}")
print(f"   Klasy: {label_encoder.classes_}")

try:
    y_train_seq_categorical = to_categorical(y_train_seq, num_classes=n_classes)
    y_test_seq_categorical = to_categorical(y_test_seq, num_classes=n_classes)
except Exception as e:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD podczas konwersji etykiet:")
    print(f"   B≈ÇƒÖd: {type(e).__name__}: {e}")
    print(f"   y_train_seq unique values: {np.unique(y_train_seq) if len(y_train_seq) > 0 else 'empty'}")
    print(f"   y_test_seq unique values: {np.unique(y_test_seq) if len(y_test_seq) > 0 else 'empty'}")
    import traceback
    traceback.print_exc()
    raise

print(f"\n‚úÖ Etykiety przekonwertowane na kategorie:")
print(f"   Kszta≈Çt y_train_seq_categorical: {y_train_seq_categorical.shape}")
print(f"   Kszta≈Çt y_test_seq_categorical: {y_test_seq_categorical.shape}")
print(f"   Liczba klas: {n_classes}")

print(f"\n‚úÖ PRZYGOTOWANIE DANYCH DLA TIME SERIES ZAKO≈ÉCZONE!")


KROK 7: PRZYGOTOWANIE DANYCH DLA TIME SERIES

üîç Sprawdzam dostƒôpno≈õƒá TensorFlow...
   TENSORFLOW_AVAILABLE = False

‚ùå‚ùå‚ùå B≈ÅƒÑD: TensorFlow/Keras nie jest dostƒôpny!

üìã INSTRUKCJA:
   1. Wr√≥ƒá do kom√≥rki KROK 1 (Cell 1)
   2. Znajd≈∫ liniƒô: SKIP_TENSORFLOW = True
   3. Zmie≈Ñ na: SKIP_TENSORFLOW = False
   4. Uruchom ponownie kom√≥rkƒô KROK 1

   Je≈õli TensorFlow powoduje crash kernela:
   - Zainstaluj TensorFlow: pip install tensorflow
   - Lub u≈ºyj TensorFlow CPU: pip install tensorflow-cpu
   - Sprawd≈∫ logi Jupytera dla szczeg√≥≈Ç√≥w b≈Çƒôdu



ImportError: TensorFlow/Keras nie jest dostƒôpny - zmie≈Ñ SKIP_TENSORFLOW = False w KROK 1

## KROK 8: TRENOWANIE MODELU TIME SERIES (LSTM/GRU)

Trenujemy model LSTM/GRU do klasyfikacji sekwencji czasowych.

Model uczy siƒô zale≈ºno≈õci temporalnych miƒôdzy kolejnymi oknami.


In [None]:
# ============================================================================
# KROK 8: TRENOWANIE MODELU TIME SERIES (LSTM/GRU)
# ============================================================================

print("=" * 80)
print("KROK 8: TRENOWANIE MODELU TIME SERIES (LSTM/GRU)")
print("=" * 80)

# Sprawd≈∫ dostƒôpno≈õƒá zmiennych
required_vars = ['X_train_seq', 'y_train_seq_categorical', 'X_test_seq', 'y_test_seq_categorical']
missing_vars = [var for var in required_vars if var not in globals()]

if missing_vars:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: {', '.join(missing_vars)}")
    print("   Najpierw uruchom KROK 7!")
    raise NameError(f"BrakujƒÖce zmienne: {', '.join(missing_vars)}")

# Sprawd≈∫ dostƒôpno≈õƒá TensorFlow
try:
    TENSORFLOW_AVAILABLE
except NameError:
    TENSORFLOW_AVAILABLE = False

if not TENSORFLOW_AVAILABLE:
    print("\n" + "="*80)
    print("‚ùå‚ùå‚ùå B≈ÅƒÑD: TensorFlow/Keras nie jest dostƒôpny!")
    print("="*80)
    print("\nüìã INSTRUKCJA:")
    print("   1. Wr√≥ƒá do kom√≥rki KROK 1 (Cell 1)")
    print("   2. Znajd≈∫ liniƒô: SKIP_TENSORFLOW = True")
    print("   3. Zmie≈Ñ na: SKIP_TENSORFLOW = False")
    print("   4. Uruchom ponownie kom√≥rkƒô KROK 1")
    print("\n   Je≈õli TensorFlow powoduje crash kernela:")
    print("   - Zainstaluj TensorFlow: pip install tensorflow")
    print("   - Lub u≈ºyj TensorFlow CPU: pip install tensorflow-cpu")
    print("   - Sprawd≈∫ logi Jupytera dla szczeg√≥≈Ç√≥w b≈Çƒôdu")
    print("\n" + "="*80)
    raise ImportError("TensorFlow/Keras nie jest dostƒôpny - zmie≈Ñ SKIP_TENSORFLOW = False w KROK 1")

# Parametry modelu
SEQUENCE_LENGTH = X_train_seq.shape[1]
N_FEATURES = X_train_seq.shape[2]
N_CLASSES = y_train_seq_categorical.shape[1]

print(f"\nüìä PARAMETRY MODELU:")
print(f"   D≈Çugo≈õƒá sekwencji (timesteps): {SEQUENCE_LENGTH}")
print(f"   Liczba cech per timestep: {N_FEATURES}")
print(f"   Liczba klas: {N_CLASSES}")

# Wyb√≥r typu modelu (LSTM lub GRU)
MODEL_TYPE = "LSTM"  # Mo≈ºna zmieniƒá na "GRU"

print(f"\nüîß Budowanie modelu {MODEL_TYPE}...")

# Budowa modelu
model = Sequential()

# Warstwa LSTM/GRU
if MODEL_TYPE == "LSTM":
    model.add(LSTM(64, return_sequences=True, input_shape=(SEQUENCE_LENGTH, N_FEATURES)))
    model.add(Dropout(0.3))
    model.add(LSTM(32, return_sequences=False))
elif MODEL_TYPE == "GRU":
    model.add(GRU(64, return_sequences=True, input_shape=(SEQUENCE_LENGTH, N_FEATURES)))
    model.add(Dropout(0.3))
    model.add(GRU(32, return_sequences=False))
else:
    raise ValueError(f"Nieznany typ modelu: {MODEL_TYPE}")

model.add(Dropout(0.3))
model.add(BatchNormalization())

# Warstwa wyj≈õciowa
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(N_CLASSES, activation='softmax'))

# Kompilacja modelu
model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print(f"\n‚úÖ Model zbudowany:")
model.summary()

# Callbacks
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=15,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=5,
    min_lr=1e-6,
    verbose=1
)

# Trenowanie modelu
print(f"\nüîß Rozpoczynam trenowanie modelu...")
print(f"   Train sequences: {len(X_train_seq)}")
print(f"   Test sequences: {len(X_test_seq)}")

BATCH_SIZE = 16
EPOCHS = 100

history = model.fit(
    X_train_seq,
    y_train_seq_categorical,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    validation_data=(X_test_seq, y_test_seq_categorical),
    callbacks=[early_stopping, reduce_lr],
    verbose=1
)

print(f"\n‚úÖ Trenowanie zako≈Ñczone!")

# Wizualizacja historii trenowania
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Wykres accuracy
axes[0].plot(history.history['accuracy'], label='Train Accuracy', linewidth=2)
axes[0].plot(history.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
axes[0].set_title('Model Accuracy', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Epoch', fontsize=12)
axes[0].set_ylabel('Accuracy', fontsize=12)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Wykres loss
axes[1].plot(history.history['loss'], label='Train Loss', linewidth=2)
axes[1].plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
axes[1].set_title('Model Loss', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Epoch', fontsize=12)
axes[1].set_ylabel('Loss', fontsize=12)
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\n‚úÖ Model {MODEL_TYPE} wytrenowany pomy≈õlnie!")


KROK 8: TRENOWANIE MODELU TIME SERIES (LSTM/GRU)

‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: X_train_seq, y_train_seq_categorical, X_test_seq, y_test_seq_categorical
   Najpierw uruchom KROK 7!


NameError: BrakujƒÖce zmienne: X_train_seq, y_train_seq_categorical, X_test_seq, y_test_seq_categorical

## KROK 9: EWALUACJA MODELU TIME SERIES

Oceniamy wyniki modelu time series u≈ºywajƒÖc tych samych metryk co w oryginalnym pliku.


In [None]:
# ============================================================================
# KROK 9: EWALUACJA MODELU TIME SERIES
# ============================================================================

print("=" * 80)
print("KROK 9: EWALUACJA MODELU TIME SERIES")
print("=" * 80)

# Sprawd≈∫ dostƒôpno≈õƒá zmiennych
required_vars = ['model', 'X_test_seq', 'y_test_seq', 'label_encoder', 'MODEL_TYPE']
missing_vars = [var for var in required_vars if var not in globals()]

if missing_vars:
    print(f"\n‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: {', '.join(missing_vars)}")
    print("   Najpierw uruchom KROK 8!")
    raise NameError(f"BrakujƒÖce zmienne: {', '.join(missing_vars)}")

# Sprawd≈∫ dostƒôpno≈õƒá TensorFlow
try:
    TENSORFLOW_AVAILABLE
except NameError:
    TENSORFLOW_AVAILABLE = False

if not TENSORFLOW_AVAILABLE:
    print("\n" + "="*80)
    print("‚ùå‚ùå‚ùå B≈ÅƒÑD: TensorFlow/Keras nie jest dostƒôpny!")
    print("="*80)
    print("\nüìã INSTRUKCJA:")
    print("   1. Wr√≥ƒá do kom√≥rki KROK 1 (Cell 1)")
    print("   2. Znajd≈∫ liniƒô: SKIP_TENSORFLOW = True")
    print("   3. Zmie≈Ñ na: SKIP_TENSORFLOW = False")
    print("   4. Uruchom ponownie kom√≥rkƒô KROK 1")
    print("\n   Je≈õli TensorFlow powoduje crash kernela:")
    print("   - Zainstaluj TensorFlow: pip install tensorflow")
    print("   - Lub u≈ºyj TensorFlow CPU: pip install tensorflow-cpu")
    print("   - Sprawd≈∫ logi Jupytera dla szczeg√≥≈Ç√≥w b≈Çƒôdu")
    print("\n" + "="*80)
    raise ImportError("TensorFlow/Keras nie jest dostƒôpny - zmie≈Ñ SKIP_TENSORFLOW = False w KROK 1")

# Predykcje
print(f"\nüîß Wykonujƒô predykcje...")
y_pred_proba = model.predict(X_test_seq, verbose=0)
y_pred = np.argmax(y_pred_proba, axis=1)

print(f"‚úÖ Predykcje wykonane")

# Oblicz metryki
accuracy = accuracy_score(y_test_seq, y_pred)
balanced_acc = balanced_accuracy_score(y_test_seq, y_pred)
macro_f1 = f1_score(y_test_seq, y_pred, average='macro')

# Confusion matrix
cm = confusion_matrix(y_test_seq, y_pred)

# Classification report
report = classification_report(
    y_test_seq, 
    y_pred, 
    target_names=label_encoder.classes_,
    output_dict=True
)

# Wyniki
results = {
    'model_name': f'{MODEL_TYPE}_TimeSeries',
    'accuracy': accuracy,
    'balanced_accuracy': balanced_acc,
    'macro_f1': macro_f1,
    'confusion_matrix': cm,
    'classification_report': report,
    'y_pred': y_pred,
    'y_true': y_test_seq
}

print(f"\n{'='*80}")
print("WYNIKI EWALUACJI")
print(f"{'='*80}")

print(f"\nüìä METRYKI GLOBALNE:")
print("-" * 80)
print(f"   Accuracy: {accuracy:.4f}")
print(f"   Balanced Accuracy: {balanced_acc:.4f}")
print(f"   Macro F1: {macro_f1:.4f}")

print(f"\nüìä CONFUSION MATRIX:")
print("-" * 80)
print(cm)
print(f"\n   Klasy: {label_encoder.classes_}")

# Wizualizacja confusion matrix
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax,
            xticklabels=label_encoder.classes_,
            yticklabels=label_encoder.classes_)
ax.set_title(f'Confusion Matrix - {MODEL_TYPE} Time Series', fontsize=14, fontweight='bold')
ax.set_xlabel('Predicted', fontsize=12)
ax.set_ylabel('True', fontsize=12)
plt.tight_layout()
plt.show()

print(f"\nüìä CLASSIFICATION REPORT:")
print("-" * 80)
print(classification_report(
    y_test_seq, 
    y_pred, 
    target_names=label_encoder.classes_
))

# Per-class metrics
print(f"\nüìä PER-CLASS METRICS:")
print("-" * 80)
for label in label_encoder.classes_:
    label_idx = label_encoder.transform([label])[0]
    precision = report[label]['precision']
    recall = report[label]['recall']
    f1 = report[label]['f1-score']
    support = report[label]['support']
    print(f"\n   {label}:")
    print(f"      Precision: {precision:.4f}")
    print(f"      Recall: {recall:.4f}")
    print(f"      F1-Score: {f1:.4f}")
    print(f"      Support: {support}")

# Por√≥wnanie z baseline (DummyClassifier)
print(f"\n{'='*80}")
print("POR√ìWNANIE Z BASELINE (DummyClassifier)")
print(f"{'='*80}")

# DummyClassifier na oryginalnych danych (nie sekwencjach)
dummy = DummyClassifier(strategy='stratified', random_state=42)
dummy.fit(X_train_seq.reshape(len(X_train_seq), -1), y_train_seq)
y_dummy_pred = dummy.predict(X_test_seq.reshape(len(X_test_seq), -1))

dummy_accuracy = accuracy_score(y_test_seq, y_dummy_pred)
dummy_balanced_acc = balanced_accuracy_score(y_test_seq, y_dummy_pred)
dummy_macro_f1 = f1_score(y_test_seq, y_dummy_pred, average='macro')

print(f"\nüìä BASELINE (DummyClassifier):")
print(f"   Accuracy: {dummy_accuracy:.4f}")
print(f"   Balanced Accuracy: {dummy_balanced_acc:.4f}")
print(f"   Macro F1: {dummy_macro_f1:.4f}")

print(f"\nüìä {MODEL_TYPE} TIME SERIES:")
print(f"   Accuracy: {accuracy:.4f} ({'+' if accuracy > dummy_accuracy else ''}{accuracy - dummy_accuracy:.4f})")
print(f"   Balanced Accuracy: {balanced_acc:.4f} ({'+' if balanced_acc > dummy_balanced_acc else ''}{balanced_acc - dummy_balanced_acc:.4f})")
print(f"   Macro F1: {macro_f1:.4f} ({'+' if macro_f1 > dummy_macro_f1 else ''}{macro_f1 - dummy_macro_f1:.4f})")

print(f"\n‚úÖ EWALUACJA ZAKO≈ÉCZONA!")


KROK 9: EWALUACJA MODELU TIME SERIES

‚ùå‚ùå‚ùå B≈ÅƒÑD: BrakujƒÖce zmienne: model, X_test_seq, y_test_seq, label_encoder, MODEL_TYPE
   Najpierw uruchom KROK 8!


NameError: BrakujƒÖce zmienne: model, X_test_seq, y_test_seq, label_encoder, MODEL_TYPE

## Podsumowanie

Ten notebook wykonuje klasyfikacjƒô emocji z danych WESAD u≈ºywajƒÖc **modelu time series (LSTM/GRU)** zamiast tradycyjnych modeli ML.

### Kluczowe r√≥≈ºnice wzglƒôdem `06_klasyfikacja_emocji_smote.ipynb`:

1. **Jeden model** zamiast piƒôciu (Logistic Regression, Random Forest, SVM, XGBoost, Ensemble)
2. **Sekwencje czasowe** - dane sƒÖ przekszta≈Çcane w sekwencje kolejnych okien
3. **Model LSTM/GRU** - uczy siƒô zale≈ºno≈õci temporalnych miƒôdzy oknami
4. **Ta sama struktura danych** - u≈ºywa tych samych funkcji wczytywania i preprocessing

### Wyniki:

Model time series mo≈ºe lepiej uchwyciƒá zale≈ºno≈õci czasowe miƒôdzy kolejnymi oknami, co mo≈ºe byƒá szczeg√≥lnie przydatne dla danych fizjologicznych, gdzie stan emocjonalny zmienia siƒô stopniowo w czasie.
