# Th√≠ nghi·ªám 4: Hybrid œÑ (Tau) Schedules for Self-Training

## üéØ M·ª•c ti√™u (OPTIMIZED)

Test **2 adaptive œÑ schedules** trong Self-Training ƒë·ªÉ so s√°nh v·ªõi constant œÑ:
- **Fixed 0.90**: Constant threshold (baseline)
- **Aggressive**: Fast decay t·ª´ 0.95 ‚Üí 0.80 (extreme adaptive)

**‚ö° Time Optimization**: Gi·∫£m t·ª´ 4 ‚Üí 2 schedules (saves ~25-30 minutes)
- B·ªè Conservative (between Fixed/Aggressive, √≠t contrast)
- B·ªè Linear Decay (t∆∞∆°ng t·ª± Aggressive, ch·ªâ m∆∞·ª£t h∆°n)

## Adaptive œÑ Strategy

### L√Ω do c·∫ßn Adaptive œÑ:
- **Early iterations**: œÑ cao (0.95) ‚Üí ch·ªçn samples r·∫•t confident ‚Üí tr√°nh confirmation bias
- **Later iterations**: œÑ th·∫•p (0.85-0.80) ‚Üí t·∫≠n d·ª•ng unlabeled data ‚Üí scale l√™n nhanh

### ‚úÖ Fixed 0.90 (Baseline)
- Constant œÑ = 0.90 su·ªët 10 iterations
- Standard approach trong self-training
- Stable nh∆∞ng c√≥ th·ªÉ b·ªè l·ª° pseudo-labels ch·∫•t l∆∞·ª£ng kh·∫£ d·ª•ng

### ‚úÖ Aggressive (Extreme Adaptive)
- Iteration 1: œÑ = 0.95 (very strict)
- Iterations 2-3: œÑ = 0.90 (moderate)
- Iterations 4-5: œÑ = 0.85 (relaxed)
- Iterations 6-10: œÑ = 0.80 (most relaxed)
- **Fast decay** ‚Üí maximize unlabeled data usage ‚Üí test accuracy/F1 c√≥ scale nhanh h∆°n?

## Metrics ƒë√°nh gi√°:
- Test Accuracy, Test F1-macro (final performance)
- Validation Accuracy/F1 curves (learning trajectory)
- Pseudo-labeling activity per iteration
- Total pseudo-labels added
- Correlation gi·ªØa œÑ v√† performance

In [None]:
# PARAMETERS
SEMI_DATASET_PATH = "data/processed/dataset_for_semi.parquet"
CUTOFF = "2017-01-01"

# œÑ schedules to compare (OPTIMIZED: 2 extremes for clearest contrast)
TAU_SCHEDULES = {
    "Fixed_0.90": [0.90] * 10,  # Baseline: kh√¥ng ƒë·ªïi
    "Aggressive": [0.95, 0.90, 0.85, 0.80, 0.80, 0.80, 0.80, 0.80, 0.80, 0.80]  # Gi·∫£m nhanh
}

# B·ªè ƒë·ªÉ gi·∫£m th·ªùi gian (~40 ph√∫t):
# "Conservative": Gi·ªØa Fixed v√† Aggressive, √≠t contrast
# "Linear_Decay": T∆∞∆°ng t·ª± Aggressive, ch·ªâ kh√°c t·ªëc ƒë·ªô gi·∫£m

# Fixed parameters
MAX_ITER = 10
MIN_NEW_PER_ITER = 20
VAL_FRAC = 0.20
RANDOM_STATE = 42

# Output directory
RESULTS_DIR = "data/processed/hybrid_tau_experiments"

In [None]:
from pathlib import Path
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display, Markdown
import warnings
warnings.filterwarnings('ignore')

from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, f1_score, classification_report

from src.semi_supervised_library import (
    SemiDataConfig, AQI_CLASSES,
    time_split, build_feature_columns, _normalize_missing, _align_proba_to_labels
)

# Setup paths
PROJECT_ROOT = Path(".").resolve()
if not (PROJECT_ROOT / "data").exists() and (PROJECT_ROOT.parent / "data").exists():
    PROJECT_ROOT = PROJECT_ROOT.parent.resolve()

results_dir = (PROJECT_ROOT / RESULTS_DIR).resolve()
results_dir.mkdir(parents=True, exist_ok=True)

print(f"Project root: {PROJECT_ROOT}")
print(f"Results directory: {results_dir}")

## Load Dataset

In [None]:
df = pd.read_parquet((PROJECT_ROOT / SEMI_DATASET_PATH).resolve())

print("Dataset shape:", df.shape)
print("Labeled fraction:", df['is_labeled'].mean())

train_df, test_df = time_split(df, cutoff=CUTOFF)
print(f"\nTrain: {len(train_df):,} samples")
print(f"  - Labeled: {train_df['is_labeled'].sum():,}")
print(f"  - Unlabeled: {(~train_df['is_labeled']).sum():,}")
print(f"Test: {len(test_df):,} samples")

## Visualize œÑ Schedules

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

colors = ['steelblue', 'forestgreen', 'coral', 'mediumpurple']
schedule_colors = dict(zip(TAU_SCHEDULES.keys(), colors))

for schedule_name, tau_values in TAU_SCHEDULES.items():
    iterations = list(range(1, len(tau_values) + 1))
    ax.plot(iterations, tau_values, marker='o', linewidth=2.5,
            label=schedule_name.replace('_', ' '), 
            color=schedule_colors[schedule_name], alpha=0.8)

ax.set_xlabel("Iteration", fontsize=12, fontweight='bold')
ax.set_ylabel("œÑ (Confidence Threshold)", fontsize=12, fontweight='bold')
ax.set_title("œÑ Schedules Over Iterations", fontsize=14, fontweight='bold')
ax.legend(fontsize=11, loc='best')
ax.grid(alpha=0.3)
ax.set_xticks(range(1, 11))
ax.set_ylim([0.75, 1.0])

plt.tight_layout()
plot_file = results_dir / "tau_schedules.png"
plt.savefig(plot_file, dpi=300, bbox_inches='tight')
print(f"‚úÖ Saved plot: {plot_file}")
plt.show()

## Custom Self-Training with œÑ Schedule

In [None]:
def run_self_training_with_schedule(df, schedule_name, tau_schedule, max_iter=10):
    """Self-training with adaptive œÑ per iteration"""
    
    print(f"\n{'='*80}")
    print(f"SCHEDULE: {schedule_name}")
    print(f"{'='*80}")
    print(f"œÑ values: {tau_schedule}")
    
    data_cfg = SemiDataConfig(cutoff=CUTOFF, random_state=RANDOM_STATE)
    
    train_df, test_df = time_split(df.copy(), cutoff=CUTOFF)
    feat_cols = build_feature_columns(train_df, data_cfg)
    
    X_all = _normalize_missing(train_df[feat_cols].copy())
    y_all = train_df[data_cfg.target_col].astype("object")
    
    # Split labeled into fit and validation
    labeled_idx = train_df.index[pd.notna(y_all)].to_numpy()
    unlabeled_idx = train_df.index[pd.isna(y_all)].to_numpy()
    
    rng = np.random.default_rng(RANDOM_STATE)
    rng.shuffle(labeled_idx)
    n_val = int(np.floor(VAL_FRAC * labeled_idx.size))
    val_idx = labeled_idx[:n_val]
    fit_idx = labeled_idx[n_val:]
    
    # Build pipeline
    cat_cols = [c for c in feat_cols if train_df[c].dtype == "object"]
    
    if cat_cols:
        encoder = ColumnTransformer([
            ("cat", OneHotEncoder(handle_unknown="ignore", sparse_output=False), cat_cols)
        ], remainder="passthrough")
    else:
        encoder = "passthrough"
    
    pipe = Pipeline([
        ("encoder", encoder),
        ("model", HistGradientBoostingClassifier(
            max_iter=100, max_depth=10, learning_rate=0.1, random_state=RANDOM_STATE
        ))
    ])
    
    # Self-training loop with adaptive œÑ
    y_work = y_all.copy()
    history = []
    
    for it in range(1, min(max_iter, len(tau_schedule)) + 1):
        tau_current = tau_schedule[it - 1]
        
        # Fit
        pipe.fit(X_all.loc[fit_idx], y_work.loc[fit_idx])
        
        # Validate
        y_val_pred = pipe.predict(X_all.loc[val_idx])
        val_acc = float(accuracy_score(y_all.loc[val_idx], y_val_pred))
        val_f1 = float(f1_score(y_all.loc[val_idx], y_val_pred, average="macro"))
        
        # Pseudo-label with current œÑ
        if unlabeled_idx.size > 0:
            proba_raw = pipe.predict_proba(X_all.loc[unlabeled_idx])
            proba = _align_proba_to_labels(proba_raw, pipe.named_steps["model"].classes_, AQI_CLASSES)
            max_prob = proba.max(axis=1)
            y_hat = np.array(AQI_CLASSES, dtype=object)[proba.argmax(axis=1)]
            
            pick_mask = max_prob >= tau_current
            picked = unlabeled_idx[pick_mask]
            picked_labels = y_hat[pick_mask]
        else:
            picked = np.array([], dtype=int)
            picked_labels = np.array([], dtype=object)
        
        n_new = int(picked.size)
        history.append({
            "iter": it,
            "tau": float(tau_current),
            "val_accuracy": val_acc,
            "val_f1_macro": val_f1,
            "unlabeled_pool": int(unlabeled_idx.size),
            "new_pseudo": n_new
        })
        
        print(f"  Iter {it:2d} (œÑ={tau_current:.3f}): Val F1={val_f1:.4f}, New pseudo={n_new:,}, Pool={unlabeled_idx.size:,}")
        
        if n_new < MIN_NEW_PER_ITER:
            print(f"  ‚ö†Ô∏è Stopped early: only {n_new} new pseudo-labels")
            break
        
        # Add pseudo-labels
        y_work.loc[picked] = picked_labels
        fit_idx = np.unique(np.concatenate([fit_idx, picked]))
        
        picked_set = set(picked.tolist())
        unlabeled_idx = np.array([i for i in unlabeled_idx if i not in picked_set], dtype=int)
    
    # Test evaluation
    X_test = _normalize_missing(test_df[feat_cols].copy())
    y_test = test_df[data_cfg.target_col].astype("object")
    mask = pd.notna(y_test)
    
    y_pred = pipe.predict(X_test.loc[mask])
    
    test_acc = float(accuracy_score(y_test.loc[mask], y_pred))
    test_f1 = float(f1_score(y_test.loc[mask], y_pred, average="macro"))
    report = classification_report(y_test.loc[mask], y_pred, output_dict=True)
    
    print(f"\n  ‚úÖ Test Results:")
    print(f"     Accuracy: {test_acc:.4f}")
    print(f"     F1-macro: {test_f1:.4f}")
    
    return {
        "schedule_name": schedule_name,
        "tau_schedule": tau_schedule,
        "history": history,
        "test_accuracy": test_acc,
        "test_f1_macro": test_f1,
        "per_class_report": report,
        "total_pseudo_labels": sum([h["new_pseudo"] for h in history]),
        "iterations_completed": len(history)
    }

## Run Experiments for All Schedules

In [None]:
results = {}

print("‚ö° Running 2/4 œÑ schedules (optimized for speed)\n")

for schedule_name, tau_schedule in TAU_SCHEDULES.items():
    result = run_self_training_with_schedule(
        df=df,
        schedule_name=schedule_name,
        tau_schedule=tau_schedule,
        max_iter=MAX_ITER
    )
    
    results[schedule_name] = result

print(f"\n{'='*80}")
print("SCHEDULE EXPERIMENTS COMPLETED (2/4 schedules)")
print(f"{'='*80}")

## Save Results

In [None]:
# OPTIMIZED: Ch·ªâ ch·∫°y 2 schedules c√≥ contrast cao nh·∫•t (gi·∫£m ~50% th·ªùi gian)
TAU_SCHEDULES = {
    "Fixed_0.90": [0.90] * 10,  # Baseline (constant)
    "Aggressive": [0.95] + [0.90] * 2 + [0.85] * 2 + [0.80] * 5  # Fast decay, th·∫•p nh·∫•t
}

# B·ªè ƒë·ªÉ gi·∫£m th·ªùi gian (~13-15 ph√∫t m·ªói schedule):
# ‚ùå "Conservative": Gi·ªØa Fixed v√† Aggressive ‚Üí √≠t contrast
# ‚ùå "Linear_Decay": T∆∞∆°ng t·ª± Aggressive nh∆∞ng m∆∞·ª£t h∆°n ‚Üí redundant
#
# L√Ω do ch·ªçn 2 schedules n√†y:
# ‚úÖ Fixed 0.90: Baseline (constant œÑ) ƒë·ªÉ so s√°nh
# ‚úÖ Aggressive: Extreme adaptive (0.95‚Üí0.80) ‚Üí quan s√°t r√µ ·∫£nh h∆∞·ªüng c·ªßa œÑ decay

print("‚ö° OPTIMIZATION: Running 2/4 schedules (saves ~25-30 minutes)")
print("   ‚úì Fixed 0.90 (baseline, constant œÑ)")
print("   ‚úì Aggressive (extreme decay: 0.95‚Üí0.80)")
print("   ‚úó Conservative (skipped)")
print("   ‚úó Linear Decay (skipped)\n")

## Create Summary Table

In [None]:
summary_data = []
for schedule_name, res in results.items():
    summary_data.append({
        "Schedule": schedule_name.replace('_', ' '),
        "Test Accuracy": res["test_accuracy"],
        "Test F1-macro": res["test_f1_macro"],
        "Pseudo-labels": res["total_pseudo_labels"],
        "Iterations": res["iterations_completed"],
        "Val F1 Peak": max([h["val_f1_macro"] for h in res["history"]]),
        "Avg œÑ": np.mean(res["tau_schedule"][:res["iterations_completed"]])
    })

summary_df = pd.DataFrame(summary_data)
summary_df = summary_df.sort_values("Test F1-macro", ascending=False)

print("\nüìä SUMMARY TABLE:")
print("="*100)
display(summary_df)

summary_csv = results_dir / "hybrid_tau_summary.csv"
summary_df.to_csv(summary_csv, index=False)
print(f"\n‚úÖ Saved summary to: {summary_csv}")

## Visualization 1: Test Performance Comparison

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

schedules = summary_df["Schedule"].tolist()
accuracies = summary_df["Test Accuracy"].tolist()
f1_scores = summary_df["Test F1-macro"].tolist()

# Accuracy
ax1 = axes[0]
bars1 = ax1.bar(schedules, accuracies, color=colors, alpha=0.8, edgecolor='black')
ax1.set_ylabel("Test Accuracy", fontsize=12, fontweight='bold')
ax1.set_title("Test Accuracy by œÑ Schedule", fontsize=14, fontweight='bold')
ax1.tick_params(axis='x', rotation=45)
ax1.grid(axis='y', alpha=0.3)
ax1.set_ylim([0.57, 0.61])

for bar in bars1:
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
             f'{height:.4f}',
             ha='center', va='bottom', fontsize=10, fontweight='bold')

# F1-macro
ax2 = axes[1]
bars2 = ax2.bar(schedules, f1_scores, color=colors, alpha=0.8, edgecolor='black')
ax2.set_ylabel("Test F1-macro", fontsize=12, fontweight='bold')
ax2.set_title("Test F1-macro by œÑ Schedule", fontsize=14, fontweight='bold')
ax2.tick_params(axis='x', rotation=45)
ax2.grid(axis='y', alpha=0.3)
ax2.set_ylim([0.50, 0.56])

for bar in bars2:
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height,
             f'{height:.4f}',
             ha='center', va='bottom', fontsize=10, fontweight='bold')

plt.tight_layout()
plot_file = results_dir / "test_performance_by_schedule.png"
plt.savefig(plot_file, dpi=300, bbox_inches='tight')
print(f"‚úÖ Saved plot: {plot_file}")
plt.show()

## Visualization 2: Validation Learning Curves

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

for schedule_name, res in results.items():
    history = res["history"]
    iterations = [h["iter"] for h in history]
    val_f1 = [h["val_f1_macro"] for h in history]
    color = schedule_colors[schedule_name]
    
    ax.plot(iterations, val_f1, marker='o', linewidth=2.5,
            label=schedule_name.replace('_', ' '), color=color, alpha=0.8)

ax.set_xlabel("Iteration", fontsize=12, fontweight='bold')
ax.set_ylabel("Validation F1-macro", fontsize=12, fontweight='bold')
ax.set_title("Validation Learning Curves by œÑ Schedule", fontsize=14, fontweight='bold')
ax.legend(fontsize=11, loc='best')
ax.grid(alpha=0.3)
ax.set_xticks(range(1, 11))

plt.tight_layout()
plot_file = results_dir / "validation_curves_by_schedule.png"
plt.savefig(plot_file, dpi=300, bbox_inches='tight')
print(f"‚úÖ Saved plot: {plot_file}")
plt.show()

## Visualization 3: Pseudo-labeling Activity Over Iterations

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

for schedule_name, res in results.items():
    history = res["history"]
    iterations = [h["iter"] for h in history]
    new_pseudo = [h["new_pseudo"] for h in history]
    color = schedule_colors[schedule_name]
    
    ax.plot(iterations, new_pseudo, marker='s', linewidth=2,
            label=schedule_name.replace('_', ' '), color=color, alpha=0.8)

ax.set_xlabel("Iteration", fontsize=12, fontweight='bold')
ax.set_ylabel("New Pseudo-labels Added", fontsize=12, fontweight='bold')
ax.set_title("Pseudo-labeling Activity by œÑ Schedule", fontsize=14, fontweight='bold')
ax.legend(fontsize=11, loc='best')
ax.grid(alpha=0.3)
ax.set_xticks(range(1, 11))
ax.set_yscale('log')

plt.tight_layout()
plot_file = results_dir / "pseudo_labeling_activity.png"
plt.savefig(plot_file, dpi=300, bbox_inches='tight')
print(f"‚úÖ Saved plot: {plot_file}")
plt.show()

## Visualization 4: Total Pseudo-labels Comparison

In [None]:
fig, ax = plt.subplots(figsize=(10, 6))

schedules_list = summary_df["Schedule"].tolist()
pseudo_totals = summary_df["Pseudo-labels"].tolist()

bars = ax.bar(schedules_list, pseudo_totals, color=colors, alpha=0.8, edgecolor='black')
ax.set_ylabel("Total Pseudo-labels Added", fontsize=12, fontweight='bold')
ax.set_title("Total Pseudo-labeling by œÑ Schedule", fontsize=14, fontweight='bold')
ax.tick_params(axis='x', rotation=45)
ax.grid(axis='y', alpha=0.3)

for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{int(height):,}',
            ha='center', va='bottom', fontsize=10, fontweight='bold')

plt.tight_layout()
plot_file = results_dir / "total_pseudo_labels.png"
plt.savefig(plot_file, dpi=300, bbox_inches='tight')
print(f"‚úÖ Saved plot: {plot_file}")
plt.show()

## Visualization 5: œÑ vs Performance Correlation

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Scatter plot: Avg œÑ vs F1
ax1 = axes[0]
avg_taus = summary_df["Avg œÑ"].tolist()
f1_list = summary_df["Test F1-macro"].tolist()

for i, schedule in enumerate(schedules_list):
    ax1.scatter(avg_taus[i], f1_list[i], s=200, c=colors[i], 
                alpha=0.8, edgecolor='black', linewidth=2)
    ax1.text(avg_taus[i], f1_list[i] + 0.002, schedule,
             ha='center', fontsize=9, fontweight='bold')

ax1.set_xlabel("Average œÑ", fontsize=12, fontweight='bold')
ax1.set_ylabel("Test F1-macro", fontsize=12, fontweight='bold')
ax1.set_title("Average œÑ vs Test Performance", fontsize=14, fontweight='bold')
ax1.grid(alpha=0.3)

# Bar plot: Pseudo-labels vs F1
ax2 = axes[1]
x = np.arange(len(schedules_list))
width = 0.35

bars1 = ax2.bar(x - width/2, [p/1000 for p in pseudo_totals], width, 
                label='Pseudo-labels (K)', color='skyblue', alpha=0.8)
ax2_twin = ax2.twinx()
bars2 = ax2_twin.bar(x + width/2, f1_list, width,
                     label='F1-macro', color='coral', alpha=0.8)

ax2.set_xlabel("Schedule", fontsize=12, fontweight='bold')
ax2.set_ylabel("Pseudo-labels (thousands)", fontsize=11, fontweight='bold', color='skyblue')
ax2_twin.set_ylabel("Test F1-macro", fontsize=11, fontweight='bold', color='coral')
ax2.set_title("Pseudo-labels vs Performance Trade-off", fontsize=14, fontweight='bold')
ax2.set_xticks(x)
ax2.set_xticklabels(schedules_list, rotation=45, ha='right')
ax2.grid(axis='y', alpha=0.3)

plt.tight_layout()
plot_file = results_dir / "tau_performance_correlation.png"
plt.savefig(plot_file, dpi=300, bbox_inches='tight')
print(f"‚úÖ Saved plot: {plot_file}")
plt.show()

## Analysis & Insights

In [None]:
print("\n" + "="*100)
print("üìä KEY FINDINGS")
print("="*100)

best_idx = summary_df["Test F1-macro"].idxmax()
best_schedule = summary_df.loc[best_idx]

print(f"\nüèÜ Best Schedule: {best_schedule['Schedule']}")
print(f"   Test F1-macro: {best_schedule['Test F1-macro']:.4f}")
print(f"   Test Accuracy: {best_schedule['Test Accuracy']:.4f}")
print(f"   Average œÑ: {best_schedule['Avg œÑ']:.3f}")
print(f"   Pseudo-labels: {best_schedule['Pseudo-labels']:,}")
print(f"   Val F1 Peak: {best_schedule['Val F1 Peak']:.4f}")

print(f"\nüìà Ranking by F1-macro:")
for i, row in summary_df.iterrows():
    print(f"   {i+1}. {row['Schedule']}: {row['Test F1-macro']:.4f} (avg œÑ={row['Avg œÑ']:.3f})")

print(f"\nüí° Schedule Behaviors:")
baseline_f1 = summary_df[summary_df["Schedule"] == "Fixed 0.90"]["Test F1-macro"].values[0]
for i, row in summary_df.iterrows():
    improvement = (row["Test F1-macro"] - baseline_f1) * 100
    print(f"   {row['Schedule']}:")
    print(f"      - F1 improvement vs Fixed: {improvement:+.2f}%")
    print(f"      - Pseudo-labels: {row['Pseudo-labels']:,}")
    print(f"      - Efficiency: {row['Test F1-macro'] / (row['Pseudo-labels']/1000):.5f} F1 per 1K labels")

print(f"\nüéØ Conclusion:")
if best_schedule["Test F1-macro"] > baseline_f1:
    print(f"   ‚úÖ Hybrid œÑ strategy IMPROVES over fixed œÑ!")
    print(f"      Best: {best_schedule['Schedule']}")
    print(f"      Improvement: +{(best_schedule['Test F1-macro'] - baseline_f1)*100:.2f}%")
else:
    print(f"   ‚ö†Ô∏è Fixed œÑ=0.90 remains competitive")
    print(f"      Hybrid strategies not significantly better")

print(f"\n‚úÖ All visualizations saved to: {results_dir}")
print("="*100)

## Dashboard Summary

In [None]:
dashboard_data = {
    "experiment_type": "hybrid_tau_schedule",
    "parameters": {
        "max_iter": MAX_ITER,
        "schedules": {k: v for k, v in TAU_SCHEDULES.items()}
    },
    "summary": summary_df.to_dict(orient='records'),
    "best_schedule": {
        "name": best_schedule["Schedule"],
        "f1_macro": float(best_schedule["Test F1-macro"]),
        "accuracy": float(best_schedule["Test Accuracy"]),
        "avg_tau": float(best_schedule["Avg œÑ"])
    },
    "baseline_comparison": {
        "fixed_tau_f1": float(baseline_f1),
        "best_hybrid_f1": float(best_schedule["Test F1-macro"]),
        "improvement": float((best_schedule["Test F1-macro"] - baseline_f1) / baseline_f1 * 100)
    },
    "visualizations": [
        "tau_schedules.png",
        "test_performance_by_schedule.png",
        "validation_curves_by_schedule.png",
        "pseudo_labeling_activity.png",
        "total_pseudo_labels.png",
        "tau_performance_correlation.png"
    ]
}

dashboard_file = results_dir / "dashboard_summary.json"
with open(dashboard_file, "w") as f:
    json.dump(dashboard_data, f, indent=2)

print(f"‚úÖ Dashboard summary saved to: {dashboard_file}")
display(Markdown(f"## Experiment Complete! ‚úÖ\n\nResults: `{results_dir}`"))