# LightGBM - Benchmark individuel
LightGBM est un algorithme de gradient boosting basé sur des arbres de décision, optimisé pour être très rapide, très efficace et performant sur les données tabulaires.


LightGBM (Light Gradient Boosting Machine) est un algorithme de gradient boosting développé par Microsoft, conçu pour entraîner des modèles d’arbres de décision de manière très rapide, efficace en mémoire et performante sur les grands jeux de données.  
Il utilise une croissance des arbres leaf‑wise (optimisation feuille par feuille) et un apprentissage basé sur des histogrammes, ce qui réduit fortement le coût de calcul tout en améliorant la précision.

LightGBM gère nativement les variables catégorielles, supporte le parallélisme, et offre d’excellentes performances sur les problèmes tabulaires complexes comme Home Credit.

## import

In [1]:
import os
import sys
from pathlib import Path

# --- Paths ---
CWD = Path.cwd()
PROJECT_ROOT = CWD.parent.parent
DB_PATH = (PROJECT_ROOT / "mlflow.db").resolve()
ARTIFACT_ROOT = (PROJECT_ROOT / "artifacts").resolve()
ARTIFACT_ROOT.mkdir(parents=True, exist_ok=True)


os.environ["MLFLOW_TRACKING_URI"] = f"sqlite:///{DB_PATH.as_posix()}"
os.environ["MLFLOW_ARTIFACT_URI"] = ARTIFACT_ROOT.as_uri()


sys.path.append(str(PROJECT_ROOT))

import mlflow  # only now


mlflow.set_tracking_uri(os.environ["MLFLOW_TRACKING_URI"])

print("CWD =", CWD)
print("Tracking URI =", mlflow.get_tracking_uri())
print("Artifacts root (env) =", os.environ["MLFLOW_ARTIFACT_URI"])
#home_credit_reduction_perimetre


CWD = c:\Users\yoann\Documents\open classrooms\projet 8\livrables\pret a dépenser\notebooks\02_benchmark
Tracking URI = sqlite:///C:/Users/yoann/Documents/open classrooms/projet 8/livrables/pret a dépenser/mlflow.db
Artifacts root (env) = file:///C:/Users/yoann/Documents/open%20classrooms/projet%208/livrables/pret%20a%20d%C3%A9penser/artifacts


In [2]:
import pandas as pd
from lightgbm import LGBMClassifier

from src.modeling.train import train_with_cv
from src.modeling.prepare_for_model import prepare_application_for_model
from src.tracking import mlflow_tracking

EXPERIMENT_NAME = "home_credit_benchmarking"
exp_id = mlflow_tracking.get_or_create_experiment(EXPERIMENT_NAME, ARTIFACT_ROOT)
mlflow.set_experiment(EXPERIMENT_NAME)

<Experiment: artifact_location='file:///C:/Users/yoann/Documents/open%20classrooms/projet%208/livrables/pret%20a%20d%C3%A9penser/artifacts', creation_time=1771138249350, experiment_id='1', last_update_time=1771138249350, lifecycle_stage='active', name='home_credit_benchmarking', tags={}>

In [3]:
exp_id = mlflow_tracking.get_or_create_experiment(EXPERIMENT_NAME, ARTIFACT_ROOT)

In [4]:


mlflow.set_experiment(EXPERIMENT_NAME)

print("Tracking URI          =", mlflow.get_tracking_uri())
print("DB_PATH               =", DB_PATH)
print("ARTIFACT_ROOT         =", ARTIFACT_ROOT)
print("Experiment name       =", EXPERIMENT_NAME)
print("Experiment id         =", exp_id)
print("Experiment artifacts  =", mlflow.get_experiment(exp_id).artifact_location)


Tracking URI          = sqlite:///C:/Users/yoann/Documents/open classrooms/projet 8/livrables/pret a dépenser/mlflow.db
DB_PATH               = C:\Users\yoann\Documents\open classrooms\projet 8\livrables\pret a dépenser\mlflow.db
ARTIFACT_ROOT         = C:\Users\yoann\Documents\open classrooms\projet 8\livrables\pret a dépenser\artifacts
Experiment name       = home_credit_benchmarking
Experiment id         = 1
Experiment artifacts  = file:///C:/Users/yoann/Documents/open%20classrooms/projet%208/livrables/pret%20a%20d%C3%A9penser/artifacts


In [5]:
exp = mlflow.get_experiment_by_name("home_credit_benchmarking")
print(exp.artifact_location)

file:///C:/Users/yoann/Documents/open%20classrooms/projet%208/livrables/pret%20a%20d%C3%A9penser/artifacts


In [6]:
DATA_PATH = PROJECT_ROOT / "data" / "processed" / "train_split.csv"
df = pd.read_csv(DATA_PATH)
X_lgb, y = prepare_application_for_model(df, model_type="boosting")

In [7]:
model_lgb = LGBMClassifier(
    objective="binary",
    n_estimators=300,
    learning_rate=0.05,
    num_leaves=31,
    class_weight="balanced",
    random_state=42,
    n_jobs=-1
)


results = train_with_cv(
    model=model_lgb,
    model_name="LightGBM",
    X=X_lgb,
    y=y,
    model_type="boosting",
    threshold=0.5,
    n_splits=5,
    random_state=42,
    log_fold_metrics=True,
    cost_fn=10,
    cost_fp=1,
    fbeta_beta=3,

    use_lgb_categorical=True,
    lgb_categorical_cols=None,  # auto depuis dtype=category
    fit_params={}
)

results


===== Entraînement (benchmark CV) : LightGBM =====

--- Fold 1/5 ---
[LightGBM] [Info] Number of positive: 13901, number of negative: 158304
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 1.173885 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 260866
[LightGBM] [Info] Number of data points in the train set: 172205, number of used features: 1652
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=-0.000000
[LightGBM] [Info] Start training from score -0.000000
   → AUC=0.7897 | Recall@0.50=0.6729 | F1@0.50=0.3111 | F3@0.50=0.5459 | Cost=20593
   → TN=30353 FP=9223 FN=1137 TP=2339 | fit=107.86s | pred=0.64s

--- Fold 2/5 ---
[LightGBM] [Info] Number of positive: 13901, number of negative: 158304
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 1.262023 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 

{'model': 'LightGBM',
 'auc_mean': 0.7827445846471643,
 'auc_std': 0.005088958482377628,
 'recall_mean_fixed_threshold': 0.653852323434693,
 'recall_std_fixed_threshold': 0.012665914711843378,
 'precision_mean_fixed_threshold': 0.1966768970214437,
 'precision_std_fixed_threshold': 0.003447224606027,
 'f1_mean_fixed_threshold': 0.3023923596163794,
 'f1_std_fixed_threshold': 0.0053785379684240416,
 'fbeta_3_mean_fixed_threshold': 0.5305265095665301,
 'fbeta_3_std_fixed_threshold': 0.00992001867611954,
 'business_cost_mean_fixed_threshold': 21311.4,
 'business_cost_std_fixed_threshold': 456.5385416369575,
 'threshold': 0.5,
 'time_sec': 522.6161382198334}