# MPL - Benchmark individuel
MLP (Multi‑Layer Perceptron) :  
Un réseau de neurones artificiels composé de plusieurs couches entièrement connectées, capable de modéliser des relations non linéaires entre les variables
Chaque neurone est entièrement connecté aux neurones de la couche suivante, et l’apprentissage se fait par rétropropagation du gradient.

Il est particulièrement adapté pour :

- capturer des relations non linéaires  
- apprendre des interactions complexes entre les variables  
- fonctionner sur des données tabulaires, images ou séries temporelles  

Mais il présente aussi des limites :

- nécessite un prétraitement important (scaling, encodage)  
- peut être lent à entraîner  
- moins performant que les modèles de boosting sur les données tabulaires structurées (comme Home Credit)


In [1]:
import os
import sys
from pathlib import Path


CWD = Path.cwd()
PROJECT_ROOT = CWD.parent.parent
DB_PATH = (PROJECT_ROOT / "mlflow.db").resolve()
ARTIFACT_ROOT = (PROJECT_ROOT / "artifacts").resolve()
ARTIFACT_ROOT.mkdir(parents=True, exist_ok=True)


os.environ["MLFLOW_TRACKING_URI"] = f"sqlite:///{DB_PATH.as_posix()}"
os.environ["MLFLOW_ARTIFACT_URI"] = ARTIFACT_ROOT.as_uri()


sys.path.append(str(PROJECT_ROOT))

import mlflow  


mlflow.set_tracking_uri(os.environ["MLFLOW_TRACKING_URI"])

print("CWD =", CWD)
print("Tracking URI =", mlflow.get_tracking_uri())
print("Artifacts root (env) =", os.environ["MLFLOW_ARTIFACT_URI"])

CWD = c:\Users\yoann\Documents\open classrooms\projet 8\livrables\pret a dépenser\notebooks\02_benchmark
Tracking URI = sqlite:///C:/Users/yoann/Documents/open classrooms/projet 8/livrables/pret a dépenser/mlflow.db
Artifacts root (env) = file:///C:/Users/yoann/Documents/open%20classrooms/projet%208/livrables/pret%20a%20d%C3%A9penser/artifacts


In [2]:
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline

from src.modeling.train import train_with_cv
from src.modeling.prepare_for_model import prepare_application_for_model
from src.modeling.prepare_for_model import make_preprocessor
from src.tracking import mlflow_tracking

EXPERIMENT_NAME = "home_credit_benchmarking"
exp_id = mlflow_tracking.get_or_create_experiment(EXPERIMENT_NAME, ARTIFACT_ROOT)
mlflow.set_experiment(EXPERIMENT_NAME)


<Experiment: artifact_location='file:///C:/Users/yoann/Documents/open%20classrooms/projet%208/livrables/pret%20a%20d%C3%A9penser/artifacts', creation_time=1771138249350, experiment_id='1', last_update_time=1771138249350, lifecycle_stage='active', name='home_credit_benchmarking', tags={}>

In [3]:
df = pd.read_csv(PROJECT_ROOT / "data" / "processed" / "train_split.csv")
X_skl, y = prepare_application_for_model(df, model_type="sklearn")

print("X_skl:", X_skl.shape, "| y:", y.shape)

preprocessor, cols = make_preprocessor(X_skl)


X_skl: (215257, 1656) | y: (215257,)


In [4]:

params_mlp = {
    "hidden_layer_sizes": (64, 32),
    "max_iter": 300,
    "early_stopping": True,
    "n_iter_no_change": 10,
    "random_state": 42,
}

model_mlp = Pipeline(
    steps=[
        ("preprocessor", preprocessor),   # ton make_preprocessor
        ("mlp", MLPClassifier(**params_mlp)),
    ]
)

results_mlp = train_with_cv(
    model=model_mlp,
    model_name="MLP",
    X=X_skl,
    y=y,
    model_type="sklearn",
    threshold=0.5,
    n_splits=5,
    random_state=42,
    log_fold_metrics=True,
    cost_fn=10,
    cost_fp=1,
    fbeta_beta=3,
)

results_mlp


===== Entraînement (benchmark CV) : MLP =====


                 SimpleImputer(add_indicator=True, strategy='me...' (58329 characters) is truncated to 6000 characters to meet the length limit.



--- Fold 1/5 ---
   → AUC=0.7707 | Recall@0.50=0.0250 | F1@0.50=0.0476 | F3@0.50=0.0277 | Cost=33981
   → TN=39485 FP=91 FN=3389 TP=87 | fit=767.92s | pred=5.28s

--- Fold 2/5 ---
   → AUC=0.7593 | Recall@0.50=0.0250 | F1@0.50=0.0475 | F3@0.50=0.0276 | Cost=33989
   → TN=39477 FP=99 FN=3389 TP=87 | fit=491.17s | pred=4.76s

--- Fold 3/5 ---
   → AUC=0.7637 | Recall@0.50=0.0184 | F1@0.50=0.0357 | F3@0.50=0.0204 | Cost=34160
   → TN=39526 FP=50 FN=3411 TP=64 | fit=634.23s | pred=4.47s

--- Fold 4/5 ---
   → AUC=0.7404 | Recall@0.50=0.0020 | F1@0.50=0.0040 | F3@0.50=0.0022 | Cost=34688
   → TN=39568 FP=8 FN=3468 TP=7 | fit=254.91s | pred=4.77s

--- Fold 5/5 ---
   → AUC=0.7525 | Recall@0.50=0.0216 | F1@0.50=0.0414 | F3@0.50=0.0239 | Cost=34073
   → TN=39503 FP=73 FN=3400 TP=75 | fit=437.03s | pred=5.22s

===== Résultats finaux (CV) =====
AUC                         : 0.7573 ± 0.0103
Recall@0.50              : 0.0184 ± 0.0086
Precision@0.50           : 0.4983 ± 0.0349
F1@0.50             

{'model': 'MLP',
 'auc_mean': 0.7573260121510511,
 'auc_std': 0.010308126580111227,
 'recall_mean_fixed_threshold': 0.01841438517770364,
 'recall_std_fixed_threshold': 0.008561524064186758,
 'precision_mean_fixed_threshold': 0.4982665825246089,
 'precision_std_fixed_threshold': 0.03486821330127722,
 'f1_mean_fixed_threshold': 0.035242442294835905,
 'f1_std_fixed_threshold': 0.01623139468612679,
 'fbeta_3_mean_fixed_threshold': 0.02035841898672431,
 'fbeta_3_std_fixed_threshold': 0.009455157971511854,
 'business_cost_mean_fixed_threshold': 34178.2,
 'business_cost_std_fixed_threshold': 263.0508696051013,
 'threshold': 0.5,
 'time_sec': 2635.2825033664703}