# 📄 Data Summary and Merge Overview

This dataset contains daily EDA recordings and narrative stress logs from 8 participants. Stress levels were manually labeled on a 0–2 scale (0 = low, 2 = high) based on self-reported notes. **Participant 5 was excluded** due to missing stress data.

The script merges all available stress label files into a single `stress_labels_all_participants.csv`, then joins this with the cleaned 4-day EDA windows (`eda_4day_stress_windows.csv`) on `participant_id` and `date`. The resulting file, `eda_4day_merged_with_labels.csv`, includes only valid windows with corresponding stress labels and will be used for downstream analysis.


In [2]:
# === 1. Setup ===
import pandas as pd
import glob
import os

# Set path to your stress label folder and EDA windows file
stress_dir = "C:/Users/lpnhu/Downloads/Stress_Testing_Analysis/data/stress_level"  # <- update this!
eda_path = "eda_qc_reports/eda_4day_stress_windows.csv"
output_path = "eda_qc_reports/eda_4day_merged_with_labels.csv"

# === 2. Merge all participant-level stress labels ===
label_files = glob.glob(os.path.join(stress_dir, "stress_label_participant*.csv"))

label_dfs = []
for file in label_files:
    df = pd.read_csv(file)
    df["date"] = pd.to_datetime(df["date"])
    label_dfs.append(df)

stress_labels_all = pd.concat(label_dfs, ignore_index=True)

# Save master label file
stress_labels_all.to_csv("stress_labels_all_participants.csv", index=False)
print("✅ Saved master stress label file.")

# === 3. Load EDA windows ===
eda_df = pd.read_csv(eda_path)
eda_df["date"] = pd.to_datetime(eda_df["start_date"])  # Use start date as label anchor

# Ensure participant_id matches types (int vs str)
eda_df["participant_id"] = eda_df["participant_id"].astype(int)
stress_labels_all["participant_id"] = stress_labels_all["participant_id"].astype(int)

# === 4. Merge on participant_id and date ===
merged_df = eda_df.merge(
    stress_labels_all,
    on=["participant_id", "date"],
    how="inner"  # Use 'left' if you want to keep all EDA rows
)

# === 5. Save merged data for analysis ===
os.makedirs(os.path.dirname(output_path), exist_ok=True)
merged_df.to_csv(output_path, index=False)
print(f"✅ Merged data saved to: {output_path}")

# === 6. Quick sanity check ===
display(merged_df.head())
print(f"Total rows: {len(merged_df)} | Unique participants: {merged_df['participant_id'].nunique()}")


✅ Saved master stress label file.
✅ Merged data saved to: eda_qc_reports/eda_4day_merged_with_labels.csv


Unnamed: 0,participant_id,start_date,mid_date1,mid_date2,end_date,date,stress_level,notes
0,1,2023-12-25,2023-12-26,2023-12-27,2023-12-28,2023-12-25,2,"Continued night shift, fatigue, heavy snacking..."
1,3,2024-02-13,2024-02-14,2024-02-15,2024-02-16,2024-02-13,1,"Board review stress in AM, errands in PM, mild..."
2,3,2024-02-14,2024-02-15,2024-02-16,2024-02-17,2024-02-14,2,"Woke early, board review, rushed to clinic, st..."
3,3,2024-02-15,2024-02-16,2024-02-17,2024-02-18,2024-02-15,2,"Leg cramps, complicated patient cases, studyin..."
4,3,2024-02-16,2024-02-17,2024-02-18,2024-02-19,2024-02-16,1,"Board review, errands, shopping, mild stress w..."


Total rows: 22 | Unique participants: 6
