# Ensemble - The "Free Lunch" (3-Model Version)

We are combining three distinct models to get the best of all worlds:
1.  **RoBERTa Base (0.85):** The Anchor. Stable and general.
2.  **RoBERTa Large (0.83):** The Genius. Smart but overfitted.
3.  **LinearSVC (0.80):** The Skeptic. Simple and robust.

**Why this works:**
LinearSVC makes *different* mistakes than RoBERTa. By adding it, we stabilize the predictions.

In [None]:
import pandas as pd

# Load submissions
sub_base = pd.read_csv("submission_v1.csv")   # RoBERTa Base (0.85)
sub_pro = pd.read_csv("submission_v2.csv")    # RoBERTa Large (0.83)
sub_svc = pd.read_csv("submission_linearsvc_oof.csv") # LinearSVC (0.80)

print("Loaded submissions.")

In [None]:
# Check Correlations
label_map = {'NON_EXTREMIST': 0, 'EXTREMIST': 1}
p_base = sub_base['Extremism_Label'].map(label_map)
p_pro = sub_pro['Extremism_Label'].map(label_map)
p_svc = sub_svc['Extremism_Label'].map(label_map)

print(f"Corr Base vs Pro: {p_base.corr(p_pro):.4f}")
print(f"Corr Base vs SVC: {p_base.corr(p_svc):.4f}") # Should be lower (Good!)

In [None]:
# Weighted Blending
# We give the most weight to the best model, but enough to the others to help.

w_base = 0.50
w_pro = 0.30
w_svc = 0.20

final_prob = (p_base * w_base) + (p_pro * w_pro) + (p_svc * w_svc)

# Threshold at 0.5
final_preds = (final_prob >= 0.5).astype(int)

inv_label_map = {0: 'NON_EXTREMIST', 1: 'EXTREMIST'}
final_labels = [inv_label_map[p] for p in final_preds]

submission = pd.DataFrame({
    'ID': sub_base['ID'],
    'Extremism_Label': final_labels
})

submission.to_csv("submission_ensemble_3model.csv", index=False)
print("Saved submission_ensemble_3model.csv")
print(submission.head())