
# Modelo de Ponderación — **Versión Avanzada**
**Generado:** 2025-08-19 02:14  
**Qué incluye:**  
- Ponderación de **clusters** por volumen × log(1+diversidad).  
- Ponderación de **métricas** por cluster (frecuencia relativa, TF‑IDF/entropía, penalización por correlación).  
- **Engagement de alto desempeño** (factor de realce por métrica según cohorte top).  
- **PCA opcional** para ajustar pesos de métricas según contribución a varianza global.  
- Scores comparativos: baseline vs weighted vs (weighted+engagement) vs (weighted+PCA) vs (weighted+engagement+PCA).


## 1) Carga y validación de datos

In [None]:

import pandas as pd
import numpy as np

# Candidatos de rutas (ajusta si usas otra)
CANDIDATE_PATHS = [
    "data/desafio_ponderadores_metrica (4).csv",
    "data/desafio_ponderadores_metrica.csv",
    "data/desafio_ponderadores_metrica.csv",
    "desafio_ponderadores_metrica.csv"
]

df = None
for p in CANDIDATE_PATHS:
    try:
        df = pd.read_csv(p)
        print(f"✅ Datos cargados desde: {p}")
        break
    except Exception:
        continue

if df is None:
    raise FileNotFoundError("No se encontró el archivo de datos en rutas conocidas. Ajusta CANDIDATE_PATHS.")

expected = {"user_id","role","seniority","cluster","metric_name","event_count"}
missing = expected - set(df.columns)
assert not missing, f"Faltan columnas requeridas: {missing}"
assert df["event_count"].ge(0).all(), "Hay valores negativos en event_count"

df.head()


✅ Datos cargados desde: data/desafio_ponderadores_metrica.csv


Unnamed: 0,user_id,role,seniority,cluster,metric_name,event_count
0,user_1,Frontend,Semi-Senior,Software Development,commits,14
1,user_1,Frontend,Semi-Senior,Software Development,pull_requests,13
2,user_1,Frontend,Semi-Senior,Software Development,builds,16
3,user_1,Frontend,Semi-Senior,Software Development,deploys,26
4,user_1,Frontend,Semi-Senior,Data Analysis,queries_executed,36



## 2) Ecuaciones (texto plano)

**Clusters**
```
w_c = ( Eventos_c * log(1 + Diversidad_c) ) / Σ_{c’}( Eventos_c’ * log(1 + Diversidad_c’) )
```
- `Eventos_c` = total de eventos del cluster `c`
- `Diversidad_c` = métricas únicas en `c`

**Métricas dentro del cluster**
```
f_m|c   = Eventos_m,c / Σ_{m’∈c} Eventos_m’,c
idf_m   = log( 1 + ( N_usuarios / (1 + Usuarios_que_usaron_m) ) )
pen_m   = 1 / ( 1 + promedio_correlaciones_abs(m, otras métricas del cluster) )

w~_m|c  = f_m|c * (1 + idf_m) * pen_m
w_m|c   = w~_m|c / Σ_{m’∈c} w~_m’|c
w_m_tot = w_c * w_m|c
```

**Scores**
```
Score_plano_u  = Σ_m event_count_u,m
Score_u        = Σ_m ( w_m_tot * event_count_u,m )

Score_norm_u   = 100 * (Score_u - min) / (max - min)
Delta_u        = Score_norm_u - Score_plano_norm_u
```

**Extensiones**  
*Engagement (cohorte top)*
```
lift_raw[m]  = share_high[m] / (share_all[m] + ε)
lift[m]      = (lift_raw[m] * n_high[m] + 1 * λ) / (n_high[m] + λ)
f_eng[m]     = clip( (lift[m])^β , 0.67, 1.50 )
w_m|c^(eng)  = normalize_by_cluster( w_m|c * f_eng[m] )
```
*PCA (contribución de métricas)*
```
contrib_PCA[m]  = Σ_{k≤K} ( loading_{m,k}^2 * var_exp_k )
f_pca[m]        = normalize( contrib_PCA[m] )
w_m|c^(pca)     = normalize_by_cluster( w_m|c * blend(f_pca[m], α) )
```
Donde `blend(f_pca, α)` puede ser: `α*1 + (1-α)*f_pca` (mezcla moderada).


## 3) Pesos de clusters (volumen × log(1+diversidad))

In [3]:

# Eventos por métrica dentro de cluster
events_mc = df.groupby(["cluster","metric_name"])["event_count"].sum().reset_index()

# Agregados por cluster
cluster_stats = events_mc.groupby("cluster").agg(
    events_cluster=("event_count","sum"),
    diversity_cluster=("metric_name","nunique")
).reset_index()

cluster_stats["raw_weight"] = cluster_stats["events_cluster"] * np.log1p(cluster_stats["diversity_cluster"])
cluster_stats["w_cluster"] = cluster_stats["raw_weight"] / cluster_stats["raw_weight"].sum()

cluster_stats.sort_values("w_cluster", ascending=False).head(10)


Unnamed: 0,cluster,events_cluster,diversity_cluster,raw_weight,w_cluster
4,Software Development,6760,4,10879.800288,0.373724
2,Data Analysis,5169,3,7165.755553,0.246145
3,Documentation,3490,2,3834.156887,0.131704
1,Collaboration,3316,2,3642.998349,0.125138
0,Automation,3267,2,3589.166347,0.123289


## 4) Pesos de métricas por cluster (frecuencia + TF‑IDF + penalización por correlación)

In [4]:

# Frecuencia relativa en el cluster
events_mc["sum_events_by_cluster"] = events_mc.groupby("cluster")["event_count"].transform("sum")
events_mc["freq_in_cluster"] = events_mc["event_count"] / events_mc["sum_events_by_cluster"]

# IDF aproximado por métrica
users_by_metric = df.groupby("metric_name")["user_id"].nunique()
N_users = df["user_id"].nunique()
idf = np.log(1 + N_users / (1 + users_by_metric)).rename("idf")
events_mc = events_mc.merge(idf, on="metric_name", how="left")

# Penalización por correlación a nivel cluster
penalty_rows = []
for c, sub in df.groupby("cluster"):
    mat = sub.pivot_table(index="user_id", columns="metric_name", values="event_count", aggfunc="sum").fillna(0)
    if mat.shape[1] < 2:
        mean_abs_corr = pd.Series(0.0, index=mat.columns)
    else:
        corr = mat.corr().abs()
        mean_abs_corr = (corr.sum() - 1) / (corr.shape[0] - 1)
    pen = 1 / (1 + mean_abs_corr)
    penalty_rows.append(pen.rename(c))

penalty_df = pd.DataFrame(penalty_rows)
penalty_df.index.name = "cluster"
penalty_df = penalty_df.reset_index().melt(id_vars="cluster", var_name="metric_name", value_name="pen_corr")
events_mc = events_mc.merge(penalty_df, on=["cluster","metric_name"], how="left")
events_mc["pen_corr"] = events_mc["pen_corr"].fillna(1.0)

# Peso de métrica dentro del cluster (base)
events_mc["raw_w_metric_in_cluster"] = events_mc["freq_in_cluster"] * (1 + events_mc["idf"]) * events_mc["pen_corr"]
events_mc["sum_raw_by_cluster"] = events_mc.groupby("cluster")["raw_w_metric_in_cluster"].transform("sum")
events_mc["w_metric_in_cluster"] = events_mc["raw_w_metric_in_cluster"] / events_mc["sum_raw_by_cluster"]

# Peso total base
metric_total_w = events_mc.merge(cluster_stats[["cluster","w_cluster"]], on="cluster", how="left")
metric_total_w["w_metric_total"] = metric_total_w["w_cluster"] * metric_total_w["w_metric_in_cluster"]
metric_total_w.sort_values("w_metric_total", ascending=False).head(10)


Unnamed: 0,cluster,metric_name,event_count,sum_events_by_cluster,freq_in_cluster,idf,pen_corr,raw_w_metric_in_cluster,sum_raw_by_cluster,w_metric_in_cluster,w_cluster,w_metric_total
9,Software Development,builds,1757,6760,0.259911,0.688184,0.959545,0.421027,1.557152,0.270383,0.373724,0.101048
12,Software Development,pull_requests,1657,6760,0.245118,0.688184,0.920528,0.380919,1.557152,0.244625,0.373724,0.091422
10,Software Development,commits,1626,6760,0.240533,0.688184,0.934167,0.379331,1.557152,0.243606,0.373724,0.091041
11,Software Development,deploys,1720,6760,0.254438,0.688184,0.875068,0.375875,1.557152,0.241386,0.373724,0.090212
4,Data Analysis,dashboards_created,1833,5169,0.354614,0.688184,0.891331,0.533599,1.553254,0.343536,0.246145,0.08456
5,Data Analysis,notebooks_modified,1637,5169,0.316696,0.688184,0.954821,0.510486,1.553254,0.328656,0.246145,0.080897
6,Data Analysis,queries_executed,1699,5169,0.32869,0.688184,0.917603,0.509169,1.553254,0.327808,0.246145,0.080688
8,Documentation,docs_edited,1795,3490,0.514327,0.688184,0.804673,0.69868,1.358436,0.514327,0.131704,0.067739
0,Automation,jobs_scheduled,1702,3267,0.520967,0.688184,0.950269,0.835751,1.60423,0.520967,0.123289,0.064229
7,Documentation,docs_created,1695,3490,0.485673,0.688184,0.804673,0.659756,1.358436,0.485673,0.131704,0.063965


## 5) Scores de usuario (baseline vs ponderado base)

In [5]:

# Join de pesos a filas usuario-métrica
df_scores = df.merge(metric_total_w[["cluster","metric_name","w_metric_total"]],
                     on=["cluster","metric_name"], how="left")

df_scores["contrib"] = df_scores["event_count"] * df_scores["w_metric_total"]

# Scores
user_weighted = df_scores.groupby("user_id")["contrib"].sum().rename("score_weighted")
user_baseline = df_scores.groupby("user_id")["event_count"].sum().rename("score_baseline")
scores = pd.concat([user_weighted, user_baseline], axis=1).fillna(0)

def minmax_norm(s):
    if s.max() == s.min():
        return s*0 + 50.0
    return 100 * (s - s.min()) / (s.max() - s.min())

scores["score_weighted_norm"] = minmax_norm(scores["score_weighted"])
scores["score_baseline_norm"] = minmax_norm(scores["score_baseline"])
scores["delta"] = scores["score_weighted_norm"] - scores["score_baseline_norm"]

scores.sort_values("score_weighted_norm", ascending=False).head(10)


Unnamed: 0_level_0,score_weighted,score_baseline,score_weighted_norm,score_baseline_norm,delta
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
user_32,23.929673,312,100.0,100.0,0.0
user_58,23.101224,294,93.138788,88.387097,4.751691
user_48,22.685046,295,89.691995,89.032258,0.659737
user_39,21.984881,291,83.893231,86.451613,-2.558382
user_61,21.926652,278,83.410981,78.064516,5.346465
user_43,21.469676,284,79.626307,81.935484,-2.309177
user_15,20.987142,265,75.629962,69.677419,5.952543
user_34,20.722306,264,73.436591,69.032258,4.404333
user_94,20.655695,274,72.884918,75.483871,-2.598953
user_49,20.513318,256,71.705748,63.870968,7.834781


## 6) Extensión: factor de **alto engagement** (cohorte top)

In [6]:

# Parámetros de engagement
TOP_PCT = 0.20   # top 20% por actividad total
EPS = 1e-12
LAMBDA = 5       # regularización de lift
BETA = 0.5       # intensidad del efecto en el factor

# Selección de cohorte: top por event_count total (baseline)
user_total = df.groupby("user_id")["event_count"].sum().sort_values(ascending=False)
k = max(1, int(len(user_total) * TOP_PCT))
U_high = set(user_total.head(k).index)

# Shares de métricas en cohorte vs población
metric_tot_high = df[df["user_id"].isin(U_high)].groupby("metric_name")["event_count"].sum()
metric_tot_all  = df.groupby("metric_name")["event_count"].sum()
sum_high = metric_tot_high.sum()
sum_all  = metric_tot_all.sum()

share_high = (metric_tot_high / (sum_high + EPS)).reindex(metric_tot_all.index, fill_value=0)
share_all  = (metric_tot_all  / (sum_all  + EPS))

lift_raw = share_high / (share_all + EPS)

# Soporte en cohorte y suavizado
n_high = (df[df["user_id"].isin(U_high)]
          .groupby("metric_name")["user_id"]
          .nunique()
          .reindex(metric_tot_all.index, fill_value=0))

lift = (lift_raw * n_high + 1.0 * LAMBDA) / (n_high + LAMBDA)

# Factor acotado
f_eng = lift.pow(BETA).clip(lower=0.67, upper=1.50).rename("f_eng")

# Aplicar a w_metric_in_cluster y renormalizar por cluster
events_mc_eng = events_mc.merge(f_eng, on="metric_name", how="left")
events_mc_eng["f_eng"] = events_mc_eng["f_eng"].fillna(1.0)

events_mc_eng["w_metric_in_cluster_eng_raw"] = events_mc_eng["w_metric_in_cluster"] * events_mc_eng["f_eng"]
events_mc_eng["sum_raw_by_cluster_eng"] = events_mc_eng.groupby("cluster")["w_metric_in_cluster_eng_raw"].transform("sum")
events_mc_eng["w_metric_in_cluster_eng"] = events_mc_eng["w_metric_in_cluster_eng_raw"] / events_mc_eng["sum_raw_by_cluster_eng"]

metric_total_w_eng = events_mc_eng.merge(cluster_stats[["cluster","w_cluster"]], on="cluster", how="left")
metric_total_w_eng["w_metric_total_eng"] = metric_total_w_eng["w_cluster"] * metric_total_w_eng["w_metric_in_cluster_eng"]

metric_total_w_eng.sort_values("w_metric_total_eng", ascending=False).head(10)


Unnamed: 0,cluster,metric_name,event_count,sum_events_by_cluster,freq_in_cluster,idf,pen_corr,raw_w_metric_in_cluster,sum_raw_by_cluster,w_metric_in_cluster,f_eng,w_metric_in_cluster_eng_raw,sum_raw_by_cluster_eng,w_metric_in_cluster_eng,w_cluster,w_metric_total_eng
9,Software Development,builds,1757,6760,0.259911,0.688184,0.959545,0.421027,1.557152,0.270383,0.989825,0.267632,1.01946,0.262523,0.373724,0.098111
11,Software Development,deploys,1720,6760,0.254438,0.688184,0.875068,0.375875,1.557152,0.241386,1.056412,0.255003,1.01946,0.250136,0.373724,0.093482
10,Software Development,commits,1626,6760,0.240533,0.688184,0.934167,0.379331,1.557152,0.243606,1.037632,0.252773,1.01946,0.247948,0.373724,0.092664
12,Software Development,pull_requests,1657,6760,0.245118,0.688184,0.920528,0.380919,1.557152,0.244625,0.997655,0.244052,1.01946,0.239393,0.373724,0.089467
4,Data Analysis,dashboards_created,1833,5169,0.354614,0.688184,0.891331,0.533599,1.553254,0.343536,1.040515,0.357455,1.006353,0.355198,0.246145,0.08743
6,Data Analysis,queries_executed,1699,5169,0.32869,0.688184,0.917603,0.509169,1.553254,0.327808,0.994532,0.326015,1.006353,0.323957,0.246145,0.079741
5,Data Analysis,notebooks_modified,1637,5169,0.316696,0.688184,0.954821,0.510486,1.553254,0.328656,0.982435,0.322883,1.006353,0.320845,0.246145,0.078974
8,Documentation,docs_edited,1795,3490,0.514327,0.688184,0.804673,0.69868,1.358436,0.514327,0.969341,0.498558,0.963361,0.51752,0.131704,0.06816
2,Collaboration,comments_left,1663,3316,0.501508,0.688184,0.942487,0.797945,1.591092,0.501508,1.018607,0.510839,0.99819,0.511766,0.125138,0.064041
7,Documentation,docs_created,1695,3490,0.485673,0.688184,0.804673,0.659756,1.358436,0.485673,0.957028,0.464803,0.963361,0.48248,0.131704,0.063545


## 7) Extensión: **PCA** para contribución de métricas (opcional)

In [7]:

from sklearn.decomposition import PCA

# Matriz usuario x métrica (toda la población)
mat_all = df.pivot_table(index="user_id", columns="metric_name", values="event_count", aggfunc="sum").fillna(0)

# Normalización simple (opcional): escalar por usuario para reducir efecto de volumen puro
# Usamos estándar sin cambiar magnitudes drásticamente
X = mat_all.values

# Elegir K componentes que expliquen ~80% de la varianza (o un máximo)
pca = PCA(n_components=min(mat_all.shape)-1)
X_pca = pca.fit_transform(X)
expl_var = pca.explained_variance_ratio_

# Determinar K por umbral de varianza acumulada
cum = expl_var.cumsum()
K = int(np.searchsorted(cum, 0.80) + 1) if len(cum)>0 else 0
K = max(1, min(K, len(expl_var)))

# Contribución de cada métrica: suma sobre k<=K de loading^2 * var_exp_k
# loadings = componentes (columnas = métricas)
loadings = pca.components_[:K, :]  # K x M
var_k = expl_var[:K].reshape(-1, 1)  # K x 1
contrib = (loadings**2 * var_k).sum(axis=0)  # tamaño M

contrib_pca = pd.Series(contrib, index=mat_all.columns, name="contrib_pca")

# Normalizar a factor multiplicativo suave
# f_pca = 1 + gamma * (zscore positivo); acotar para estabilidad
gamma = 0.5
c_mean, c_std = contrib_pca.mean(), contrib_pca.std() if contrib_pca.std()>0 else 1.0
z = (contrib_pca - c_mean) / c_std
f_pca = (1 + gamma * z).clip(lower=0.67, upper=1.50).rename("f_pca")

# Aplicar a w_metric_in_cluster y renormalizar por cluster
events_mc_pca = events_mc.merge(f_pca, on="metric_name", how="left")
events_mc_pca["f_pca"] = events_mc_pca["f_pca"].fillna(1.0)

events_mc_pca["w_metric_in_cluster_pca_raw"] = events_mc_pca["w_metric_in_cluster"] * events_mc_pca["f_pca"]
events_mc_pca["sum_raw_by_cluster_pca"] = events_mc_pca.groupby("cluster")["w_metric_in_cluster_pca_raw"].transform("sum")
events_mc_pca["w_metric_in_cluster_pca"] = events_mc_pca["w_metric_in_cluster_pca_raw"] / events_mc_pca["sum_raw_by_cluster_pca"]

metric_total_w_pca = events_mc_pca.merge(cluster_stats[["cluster","w_cluster"]], on="cluster", how="left")
metric_total_w_pca["w_metric_total_pca"] = metric_total_w_pca["w_cluster"] * metric_total_w_pca["w_metric_in_cluster_pca"]

K, contrib_pca.sort_values(ascending=False).head(10)


(9,
 metric_name
 queries_executed      0.094198
 pull_requests         0.082632
 files_shared          0.081477
 deploys               0.070204
 docs_created          0.068783
 commits               0.067663
 scripts_run           0.060570
 docs_edited           0.058781
 dashboards_created    0.057792
 comments_left         0.050293
 Name: contrib_pca, dtype: float64)

## 8) Mezcla **Engagement + PCA** (opcional)

In [8]:

# Mezclar ambos factores sobre w_metric_in_cluster base
# Usamos multiplicación y normalización por cluster (controlada con acotamientos previos)
events_mc_mix = events_mc.merge(f_eng, on="metric_name", how="left").merge(f_pca, on="metric_name", how="left")
events_mc_mix["f_eng"] = events_mc_mix["f_eng"].fillna(1.0)
events_mc_mix["f_pca"] = events_mc_mix["f_pca"].fillna(1.0)

events_mc_mix["w_metric_in_cluster_mix_raw"] = events_mc_mix["w_metric_in_cluster"] * events_mc_mix["f_eng"] * events_mc_mix["f_pca"]
events_mc_mix["sum_raw_by_cluster_mix"] = events_mc_mix.groupby("cluster")["w_metric_in_cluster_mix_raw"].transform("sum")
events_mc_mix["w_metric_in_cluster_mix"] = events_mc_mix["w_metric_in_cluster_mix_raw"] / events_mc_mix["sum_raw_by_cluster_mix"]

metric_total_w_mix = events_mc_mix.merge(cluster_stats[["cluster","w_cluster"]], on="cluster", how="left")
metric_total_w_mix["w_metric_total_mix"] = metric_total_w_mix["w_cluster"] * metric_total_w_mix["w_metric_in_cluster_mix"]

metric_total_w_mix.sort_values("w_metric_total_mix", ascending=False).head(10)


Unnamed: 0,cluster,metric_name,event_count,sum_events_by_cluster,freq_in_cluster,idf,pen_corr,raw_w_metric_in_cluster,sum_raw_by_cluster,w_metric_in_cluster,f_eng,f_pca,w_metric_in_cluster_mix_raw,sum_raw_by_cluster_mix,w_metric_in_cluster_mix,w_cluster,w_metric_total_mix
12,Software Development,pull_requests,1657,6760,0.245118,0.688184,0.920528,0.380919,1.557152,0.244625,0.997655,1.5,0.366077,1.140676,0.32093,0.373724,0.119939
6,Data Analysis,queries_executed,1699,5169,0.32869,0.688184,0.917603,0.509169,1.553254,0.327808,0.994532,1.5,0.489023,1.007005,0.485621,0.246145,0.119533
11,Software Development,deploys,1720,6760,0.254438,0.688184,0.875068,0.375875,1.557152,0.241386,1.056412,1.209616,0.308456,1.140676,0.270415,0.373724,0.101061
10,Software Development,commits,1626,6760,0.240533,0.688184,0.934167,0.379331,1.557152,0.243606,1.037632,1.13473,0.286829,1.140676,0.251455,0.373724,0.093975
3,Collaboration,files_shared,1653,3316,0.498492,0.688184,0.942487,0.793147,1.591092,0.498492,0.97765,1.5,0.731026,1.073288,0.681109,0.125138,0.085232
4,Data Analysis,dashboards_created,1833,5169,0.354614,0.688184,0.891331,0.533599,1.553254,0.343536,1.040515,0.843884,0.30165,1.007005,0.299552,0.246145,0.073733
7,Documentation,docs_created,1695,3490,0.485673,0.688184,0.804673,0.659756,1.358436,0.485673,0.957028,1.167744,0.542771,0.978027,0.554965,0.131704,0.073091
1,Automation,scripts_run,1565,3267,0.479033,0.688184,0.950269,0.768479,1.60423,0.479033,1.000191,0.925752,0.44355,0.782616,0.566753,0.123289,0.069874
9,Software Development,builds,1757,6760,0.259911,0.688184,0.959545,0.421027,1.557152,0.270383,0.989825,0.67,0.179313,1.140676,0.157199,0.373724,0.058749
8,Documentation,docs_edited,1795,3490,0.514327,0.688184,0.804673,0.69868,1.358436,0.514327,0.969341,0.87303,0.435256,0.978027,0.445035,0.131704,0.058613


## 9) Scores por variante y comparación

In [9]:

def build_scores(df_weights, weight_col, suffix):
    tmp = df.merge(df_weights[["cluster","metric_name", weight_col]], on=["cluster","metric_name"], how="left")
    tmp[f"contrib_{suffix}"] = tmp["event_count"] * tmp[weight_col]
    user = tmp.groupby("user_id")[f"contrib_{suffix}"].sum().rename(f"score_{suffix}")
    return user

scores_all = scores.copy()

# Variantes
scores_all = scores_all.join(build_scores(metric_total_w_eng, "w_metric_total_eng", "weighted_eng"), how="left")
scores_all = scores_all.join(build_scores(metric_total_w_pca, "w_metric_total_pca", "weighted_pca"), how="left")
scores_all = scores_all.join(build_scores(metric_total_w_mix, "w_metric_total_mix", "weighted_mix"), how="left")

# Normalizaciones
for col in ["score_weighted_eng", "score_weighted_pca", "score_weighted_mix"]:
    scores_all[f"{col}_norm"] = minmax_norm(scores_all[col].fillna(0))

# Deltas contra baseline y contra ponderado base
for col in ["score_weighted_eng_norm", "score_weighted_pca_norm", "score_weighted_mix_norm"]:
    scores_all[f"delta_vs_plain__{col}"] = scores_all[col] - scores_all["score_baseline_norm"]
    scores_all[f"delta_vs_weighted__{col}"] = scores_all[col] - scores_all["score_weighted_norm"]

scores_all.sort_values("score_weighted_mix_norm", ascending=False).head(10)


Unnamed: 0_level_0,score_weighted,score_baseline,score_weighted_norm,score_baseline_norm,delta,score_weighted_eng,score_weighted_pca,score_weighted_mix,score_weighted_eng_norm,score_weighted_pca_norm,score_weighted_mix_norm,delta_vs_plain__score_weighted_eng_norm,delta_vs_weighted__score_weighted_eng_norm,delta_vs_plain__score_weighted_pca_norm,delta_vs_weighted__score_weighted_pca_norm,delta_vs_plain__score_weighted_mix_norm,delta_vs_weighted__score_weighted_mix_norm
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
user_32,23.929673,312,100.0,100.0,0.0,24.026642,23.932754,24.000233,100.0,100.0,100.0,-1.421085e-14,-1.421085e-14,0.0,0.0,0.0,0.0
user_58,23.101224,294,93.138788,88.387097,4.751691,23.173193,22.79928,22.860035,92.991672,90.774853,90.757714,4.604575,-0.1471157,2.387756,-2.363935,2.370618,-2.381073
user_39,21.984881,291,83.893231,86.451613,-2.558382,21.984717,22.414517,22.411263,83.232171,87.643337,87.120028,-3.219442,-0.6610603,1.191724,3.750106,0.668415,3.226796
user_43,21.469676,284,79.626307,81.935484,-2.309177,21.503772,21.92005,21.947683,79.282763,83.618957,83.362316,-2.652721,-0.343544,1.683473,3.99265,1.426832,3.736009
user_48,22.685046,295,89.691995,89.032258,0.659737,22.626472,21.929487,21.911238,88.502125,83.695765,83.066896,-0.5301333,-1.189871,-5.336493,-5.99623,-5.965362,-6.625099
user_36,20.290323,259,69.858912,65.806452,4.05246,20.308477,21.593799,21.589128,69.467269,80.963664,80.455915,3.660817,-0.3916425,15.157213,11.104753,14.649464,10.597004
user_61,21.926652,278,83.410981,78.064516,5.346465,21.922476,21.565268,21.565279,82.721069,80.731453,80.262599,4.656552,-0.6899127,2.666937,-2.679528,2.198082,-3.148383
user_34,20.722306,264,73.436591,69.032258,4.404333,20.786568,21.283094,21.353713,73.393242,78.434887,78.547679,4.360984,-0.04334921,9.402629,4.998296,9.515421,5.111088
user_15,20.987142,265,75.629962,69.677419,5.952543,20.949781,21.160759,21.118666,74.73351,77.439227,76.642416,5.056091,-0.8964521,7.761808,1.809265,6.964996,1.012453
user_47,20.445487,264,71.143977,69.032258,2.111719,20.569115,20.766418,20.869927,71.607568,74.229756,74.626172,2.57531,0.4635906,5.197497,3.085778,5.593914,3.482195


## 10) Exportar salidas

In [11]:

# Exports base
cluster_stats.to_csv("data/_v2_cluster_weights.csv", index=False)
metric_total_w.to_csv("data/_v2_metric_weights_base.csv", index=False)
scores.reset_index().to_csv("data/_v2_user_scores_base.csv", index=False)

# Exports engagement, PCA y mix
metric_total_w_eng.to_csv("data/_v2_metric_weights_eng.csv", index=False)
metric_total_w_pca.to_csv("data/_v2_metric_weights_pca.csv", index=False)
metric_total_w_mix.to_csv("data/_v2_metric_weights_mix.csv", index=False)
scores_all.reset_index().to_csv("data/_v2_user_scores_all_variants.csv", index=False)

print("Archivos exportados:")
print(" - data/_v2_cluster_weights.csv")
print(" - data/_v2_metric_weights_base.csv")
print(" - data/_v2_metric_weights_eng.csv")
print(" - data/_v2_metric_weights_pca.csv")
print(" - data/_v2_metric_weights_mix.csv")
print(" - data/_v2_user_scores_base.csv")
print(" - data/_v2_user_scores_all_variants.csv")


Archivos exportados:
 - data/_v2_cluster_weights.csv
 - data/_v2_metric_weights_base.csv
 - data/_v2_metric_weights_eng.csv
 - data/_v2_metric_weights_pca.csv
 - data/_v2_metric_weights_mix.csv
 - data/_v2_user_scores_base.csv
 - data/_v2_user_scores_all_variants.csv



### Notas
- Los factores `f_eng` y `f_pca` están **acotados** para estabilidad numérica y explicabilidad.
- Todos los ajustes vuelven a **normalizar por cluster** para preservar comparabilidad.
- Parámetros ajustables: `TOP_PCT`, `LAMBDA`, `BETA`, `gamma`, umbral de varianza de PCA (80%).  
  Ajustarlos según validación con negocio o métricas de separación entre cohortes.
