# üéØ MLflow Tracking - Pr√©diction de Churn Bancaire

**Pipeline:**
1. Tracking de mod√®les (Baseline + Tuned + Ensemble)
2. Lecture des r√©sultats avec Pandas
3. S√©lection du meilleur mod√®le (ROC-AUC + F1-Score)
4. Chargement et utilisation du mod√®le
5. Enregistrement dans Model Registry (local)



## 1. üì¶ Configuration et Imports

In [10]:
# Imports
import pandas as pd
import numpy as np
import pickle
import os
from datetime import datetime
from pathlib import Path

# MLflow
import mlflow
import mlflow.sklearn
from dotenv import load_dotenv

# ML Models (si n√©cessaire pour charger les mod√®les)
from sklearn.ensemble import RandomForestClassifier, StackingClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier

# Metrics
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    f1_score, roc_auc_score
)

import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Imports termin√©s")

‚úÖ Imports termin√©s


In [11]:
# Configuration MLflow + DagsHub
load_dotenv()

DAGSHUB_USERNAME = "karrayyessine1"
DAGSHUB_TOKEN = "2b2313d8f6c5cac7bd36505929faecedfdfb8ed4"
DAGSHUB_REPO = "MLOps_Project"

MLFLOW_TRACKING_URI = f"https://dagshub.com/{DAGSHUB_USERNAME}/{DAGSHUB_REPO}.mlflow"
EXPERIMENT_NAME = "churn_prediction"

# Credentials
os.environ['MLFLOW_TRACKING_USERNAME'] = DAGSHUB_USERNAME
os.environ['MLFLOW_TRACKING_PASSWORD'] = DAGSHUB_TOKEN

# Configuration MLflow
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

# Cr√©er ou r√©cup√©rer l'exp√©rience
try:
    experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)
    if experiment is None:
        experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)
        print(f"‚úÖ Exp√©rience cr√©√©e: {EXPERIMENT_NAME}")
    else:
        experiment_id = experiment.experiment_id
        print(f"‚úÖ Exp√©rience existante: {EXPERIMENT_NAME}")
except Exception as e:
    print(f"‚ö†Ô∏è Erreur: {e}")
    experiment_id = mlflow.create_experiment(EXPERIMENT_NAME)

mlflow.set_experiment(EXPERIMENT_NAME)

print(f"üìä Tracking URI: {MLFLOW_TRACKING_URI}")
print(f"üß™ Experiment: {EXPERIMENT_NAME}")

‚úÖ Exp√©rience existante: churn_prediction
üìä Tracking URI: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow
üß™ Experiment: churn_prediction


## 2. üìÇ Chargement des Donn√©es

In [12]:
# Charger les donn√©es preprocess√©es
DATA_PATH = 'processors/preprocessed_data.pkl'

with open(DATA_PATH, 'rb') as f:
    data = pickle.load(f)

X_train = data['X_train']
X_test = data['X_test']
y_train = data['y_train']
y_test = data['y_test']

print("‚úÖ Donn√©es charg√©es")
print(f"   Train: {X_train.shape}")
print(f"   Test: {X_test.shape}")
print(f"   Churn rate (train): {y_train.mean()*100:.2f}%")

‚úÖ Donn√©es charg√©es
   Train: (8101, 35)
   Test: (2026, 35)
   Churn rate (train): 16.07%


## 3. ü§ñ Fonctions Utilitaires

In [13]:
# Fonction pour calculer les m√©triques
def calculate_metrics(y_true, y_pred, y_proba):
    return {
        'accuracy': accuracy_score(y_true, y_pred),
        'precision': precision_score(y_true, y_pred, zero_division=0),
        'recall': recall_score(y_true, y_pred, zero_division=0),
        'f1_score': f1_score(y_true, y_pred, zero_division=0),
        'roc_auc': roc_auc_score(y_true, y_proba)
    }

# Fonction pour logger un mod√®le dans MLflow
def log_model_mlflow(model, model_name, stage, metrics, duration, best_params=None):
    """
    Log un mod√®le dans MLflow de mani√®re compatible DagsHub
    """
    with mlflow.start_run(run_name=f"{model_name}_{stage}"):
        # Log params
        mlflow.log_param('model_name', model_name)
        mlflow.log_param('stage', stage)
        mlflow.log_param('n_features', X_train.shape[1])
        mlflow.log_param('dataset', 'churn_prediction')
        
        # Log best params si disponibles
        if best_params:
            for k, v in best_params.items():
                try:
                    mlflow.log_param(f'best_{k}', str(v)[:250])
                except:
                    pass
        
        # Log metrics
        for metric_name, metric_value in metrics.items():
            mlflow.log_metric(metric_name, metric_value)
        mlflow.log_metric('training_duration_sec', duration)
        
        # Sauvegarder le mod√®le localement
        model_filename = f"{model_name.replace(' ', '_')}_{stage}.pkl"
        with open(model_filename, 'wb') as f:
            pickle.dump(model, f)
        
        # Log comme artifact
        try:
            mlflow.log_artifact(model_filename)
        except Exception as e:
            print(f"‚ö†Ô∏è Artifact non logg√©: {e}")
        
        run_id = mlflow.active_run().info.run_id
        return run_id, model_filename

print("‚úÖ Fonctions utilitaires d√©finies")

‚úÖ Fonctions utilitaires d√©finies


## 4. üöÄ Entra√Ænement des Mod√®les Baseline (4 mod√®les)

In [14]:
# Charger les mod√®les depuis processors/models/ (d√©j√† entra√Æn√©s)
MODELS_DIR = 'processors/models/'

# Lire les m√©tadonn√©es
with open('processors/models/best_model_final_metadata.pkl', 'rb') as f:
    metadata = pickle.load(f)

# Charger le meilleur mod√®le
with open('processors/models/best_model_final.pkl', 'rb') as f:
    best_model = pickle.load(f)

print("‚úÖ Mod√®le charg√© depuis processors/models/")
print(f"   Mod√®le: {metadata.get('model_name')}")
print(f"   ROC-AUC: {metadata.get('metrics', {}).get('roc_auc', 'N/A')}")

# Pr√©dictions sur test set
y_pred = best_model.predict(X_test)
y_proba = best_model.predict_proba(X_test)[:, 1]

# M√©triques
metrics = calculate_metrics(y_test, y_pred, y_proba)
duration = metadata.get('training_time_sec', 0)

print(f"\nüìä M√©triques sur Test Set:")
for k, v in metrics.items():
    print(f"   {k}: {v:.4f}")

# Log dans MLflow
print("\nüöÄ Log du mod√®le dans MLflow...")
run_id, model_file = log_model_mlflow(
    best_model, 
    metadata.get('model_name', 'Best_Model'), 
    'production', 
    metrics, 
    duration,
    best_params=metadata.get('best_params')
)

print(f"‚úÖ Mod√®le logg√© - Run ID: {run_id}")

‚úÖ Mod√®le charg√© depuis processors/models/
   Mod√®le: LightGBM (Tuned)
   ROC-AUC: 0.9931334509112286

üìä M√©triques sur Test Set:
   accuracy: 0.9748
   precision: 0.9477
   recall: 0.8923
   f1_score: 0.9192
   roc_auc: 0.9931

üöÄ Log du mod√®le dans MLflow...
üèÉ View run LightGBM (Tuned)_production at: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow/#/experiments/0/runs/776850e9d5a04391811aa744c07238c2
üß™ View experiment at: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow/#/experiments/0
‚úÖ Mod√®le logg√© - Run ID: 776850e9d5a04391811aa744c07238c2


## 5. üîç Fine-Tuning (4 mod√®les avec n_iter=5)

In [15]:
# Charger les r√©sultats des mod√®les depuis le fichier CSV
comparison_df = pd.read_csv('processors/model_comparison_final.csv')

# V√©rifier les colonnes disponibles
print("Colonnes disponibles:", comparison_df.columns.tolist())

# Filtrer uniquement les mod√®les tuned (adapter selon vos colonnes)
if 'Type' in comparison_df.columns:
    tuned_models_df = comparison_df[comparison_df['Type'] == 'Fine-Tuned'].copy()
elif 'model_type' in comparison_df.columns:
    tuned_models_df = comparison_df[comparison_df['model_type'] == 'Fine-Tuned'].copy()
else:
    # Filtrer par nom contenant "Tuned"
    tuned_models_df = comparison_df[comparison_df['Mod√®le'].str.contains('Tuned', na=False)].copy()

print(f"\nüöÄ Log des {len(tuned_models_df)} mod√®les tuned dans MLflow...\n")

tuned_results = []

for idx, row in tuned_models_df.iterrows():
    # Adapter le nom de la colonne selon votre CSV
    model_name = row.get('Mod√®le', row.get('model', '')).replace(' (Tuned)', '').strip()
    
    print(f"üìä {model_name}...", end=" ")
    
    # Extraire les m√©triques (adapter les noms de colonnes)
    metrics = {
        'accuracy': row.get('Accuracy', row.get('accuracy', 0)),
        'precision': row.get('Precision', row.get('precision', 0)),
        'recall': row.get('Recall', row.get('recall', 0)),
        'f1_score': row.get('F1-Score', row.get('f1_score', 0)),
        'roc_auc': row.get('ROC-AUC', row.get('roc_auc', 0)),
        'pr_auc': row.get('PR-AUC', row.get('pr_auc', 0))
    }
    
    duration = row.get('Temps (s)', row.get('training_time_sec', 0))
    
    # Log dans MLflow
    with mlflow.start_run(run_name=f"{model_name}_tuned"):
        mlflow.log_param('model_name', model_name)
        mlflow.log_param('stage', 'tuned')
        mlflow.log_param('dataset', 'churn_prediction')
        
        for metric_name, metric_value in metrics.items():
            if metric_value > 0:
                mlflow.log_metric(metric_name, metric_value)
        
        mlflow.log_metric('training_duration_sec', duration)
        
        run_id = mlflow.active_run().info.run_id
        
        tuned_results.append({
            'model': model_name,
            'stage': 'tuned',
            'run_id': run_id,
            **metrics,
            'duration': duration
        })
    
    print(f"ROC-AUC: {metrics['roc_auc']:.4f}")

print("\n‚úÖ Tous les mod√®les tuned logg√©s!")

Colonnes disponibles: ['Mod√®le', 'Type', 'Accuracy', 'Precision', 'Recall', 'F1-Score', 'ROC-AUC', 'PR-AUC', 'Temps (s)']

üöÄ Log des 6 mod√®les tuned dans MLflow...

üìä LightGBM... üèÉ View run LightGBM_tuned at: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow/#/experiments/0/runs/576bad6774914fc0a0cd47ca6fbef372
üß™ View experiment at: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow/#/experiments/0
ROC-AUC: 0.9931
üìä CatBoost... üèÉ View run CatBoost_tuned at: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow/#/experiments/0/runs/a839d824ec1840ada0c950380d590e68
üß™ View experiment at: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow/#/experiments/0
ROC-AUC: 0.9926
üìä Gradient Boosting... üèÉ View run Gradient Boosting_tuned at: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow/#/experiments/0/runs/37f91bf2aa2f4dc9a0e74ef4f420d95a
üß™ View experiment at: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow/#/experiments/0
ROC-AUC: 

## 6. üéØ Stacking Ensembles (2 mod√®les)

In [17]:
# Prendre les meilleurs mod√®les tun√©s pour le stacking
estimators = [
    ('rf', trained_models['RandomForest_tuned']),
    ('xgb', trained_models['XGBoost_tuned']),
    ('lgbm', trained_models['LightGBM_tuned']),
    ('cat', trained_models['CatBoost_tuned'])
]

ensemble_results = []

# 1. Stacking avec Logistic Regression
print("üìä Stacking (LogReg)...", end=" ")
start = datetime.now()

stacking_lr = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression(random_state=42),
    cv=3,
    n_jobs=-1
)

stacking_lr.fit(X_train, y_train)
y_pred = stacking_lr.predict(X_test)
y_proba = stacking_lr.predict_proba(X_test)[:, 1]

metrics_stack_lr = calculate_metrics(y_test, y_pred, y_proba)
duration = (datetime.now() - start).total_seconds()

run_id_lr, _ = log_model_mlflow(stacking_lr, 'Stacking_LR', 'ensemble', metrics_stack_lr, duration)
trained_models['Stacking_LR'] = stacking_lr
ensemble_results.append({
    'model': 'Stacking_LR',
    'stage': 'ensemble',
    'run_id': run_id_lr,
    **metrics_stack_lr,
    'duration': duration
})

print(f"ROC-AUC: {metrics_stack_lr['roc_auc']:.4f} ({duration:.1f}s)")

# 2. Voting Classifier (soft voting)
print("üìä Voting (Soft)...", end=" ")
start = datetime.now()

voting_clf = VotingClassifier(
    estimators=estimators,
    voting='soft',
    n_jobs=-1
)

voting_clf.fit(X_train, y_train)
y_pred = voting_clf.predict(X_test)
y_proba = voting_clf.predict_proba(X_test)[:, 1]

metrics_voting = calculate_metrics(y_test, y_pred, y_proba)
duration = (datetime.now() - start).total_seconds()

run_id_vote, _ = log_model_mlflow(voting_clf, 'Voting_Soft', 'ensemble', metrics_voting, duration)
trained_models['Voting_Soft'] = voting_clf
ensemble_results.append({
    'model': 'Voting_Soft',
    'stage': 'ensemble',
    'run_id': run_id_vote,
    **metrics_voting,
    'duration': duration
})

print(f"ROC-AUC: {metrics_voting['roc_auc']:.4f} ({duration:.1f}s)")

print("\n‚úÖ Ensembles termin√©s!")

NameError: name 'trained_models' is not defined

## 7. üìä Lecture des R√©sultats avec Pandas DataFrame

In [18]:
# Charger les mod√®les ensemble depuis le CSV
ensemble_models_df = comparison_df[comparison_df['Type'] == 'Ensemble'].copy()

print(f"üöÄ Log des {len(ensemble_models_df)} mod√®les ensemble dans MLflow...\n")

ensemble_results = []

for idx, row in ensemble_models_df.iterrows():
    model_name = row['Mod√®le'].strip()
    
    print(f"üìä {model_name}...", end=" ")
    
    # Extraire les m√©triques
    metrics = {
        'accuracy': row['Accuracy'],
        'precision': row['Precision'],
        'recall': row['Recall'],
        'f1_score': row['F1-Score'],
        'roc_auc': row['ROC-AUC'],
        'pr_auc': row['PR-AUC']
    }
    
    duration = row['Temps (s)']
    
    # Log dans MLflow
    with mlflow.start_run(run_name=f"{model_name}_ensemble"):
        mlflow.log_param('model_name', model_name)
        mlflow.log_param('stage', 'ensemble')
        mlflow.log_param('dataset', 'churn_prediction')
        
        for metric_name, metric_value in metrics.items():
            if metric_value > 0:
                mlflow.log_metric(metric_name, metric_value)
        
        mlflow.log_metric('training_duration_sec', duration)
        
        run_id = mlflow.active_run().info.run_id
        
        ensemble_results.append({
            'model': model_name,
            'stage': 'ensemble',
            'run_id': run_id,
            **metrics,
            'duration': duration
        })
    
    print(f"ROC-AUC: {metrics['roc_auc']:.4f}")

print("\n‚úÖ Tous les mod√®les ensemble logg√©s!")

üöÄ Log des 0 mod√®les ensemble dans MLflow...


‚úÖ Tous les mod√®les ensemble logg√©s!


In [21]:
# Lire depuis MLflow directement
print("\nüì• Lecture depuis MLflow...\n")

# Obtenir l'ID de l'experiment
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)
experiment_id = experiment.experiment_id

# Rechercher toutes les runs
df_mlflow = mlflow.search_runs(
    experiment_ids=[experiment_id],
    filter_string="metrics.roc_auc > 0",
    order_by=["metrics.roc_auc DESC"]
)

# Afficher les colonnes importantes
if len(df_mlflow) > 0:
    cols_to_show = [
        'run_id', 
        'params.model_name', 
        'params.stage', 
        'metrics.roc_auc', 
        'metrics.f1_score', 
        'metrics.training_duration_sec'
    ]
    available_cols = [col for col in cols_to_show if col in df_mlflow.columns]
    
    print(df_mlflow[available_cols].head(10))
    print(f"\n‚úÖ {len(df_mlflow)} runs trouv√©es dans MLflow")
    print(f"\nüèÜ Meilleur mod√®le: {df_mlflow.iloc[0]['params.model_name']} (ROC-AUC: {df_mlflow.iloc[0]['metrics.roc_auc']:.4f})")
else:
    print("‚ö†Ô∏è Aucune run trouv√©e dans MLflow")


üì• Lecture depuis MLflow...

                             run_id    params.model_name params.stage  \
0  576bad6774914fc0a0cd47ca6fbef372             LightGBM        tuned   
1  776850e9d5a04391811aa744c07238c2     LightGBM (Tuned)   production   
2  a839d824ec1840ada0c950380d590e68             CatBoost        tuned   
3  37f91bf2aa2f4dc9a0e74ef4f420d95a    Gradient Boosting        tuned   
4  fb3d7193fa82410fb1f64b9f5ab68e0e              XGBoost        tuned   
5  522ab58c5ee04b3da0585c4b7c588b5f        Random Forest        tuned   
6  8365fc2b8396412792b2c94db0ae17b7  Logistic Regression        tuned   

   metrics.roc_auc  metrics.f1_score  metrics.training_duration_sec  
0         0.993133          0.919176                       0.991415  
1         0.993133          0.919176                       0.991415  
2         0.992580          0.911672                       4.697798  
3         0.991030          0.898089                      32.654564  
4         0.984216          0.847

## 8. üèÜ S√©lection du Meilleur Mod√®le (ROC-AUC)

In [22]:
# Identifier le meilleur mod√®le depuis MLflow
print("üèÜ MEILLEUR MOD√àLE (ROC-AUC)")
print("="*60)

if len(df_mlflow) > 0:
    best_row = df_mlflow.iloc[0]
    
    best_model_name = best_row['params.model_name']
    best_stage = best_row['params.stage']
    best_run_id = best_row['run_id']
    best_roc_auc = best_row['metrics.roc_auc']
    
    print(f"Mod√®le:    {best_model_name}")
    print(f"Stage:     {best_stage}")
    print(f"ROC-AUC:   {best_roc_auc:.4f}")
    print(f"F1-Score:  {best_row.get('metrics.f1_score', 'N/A'):.4f}")
    print(f"Precision: {best_row.get('metrics.precision', 'N/A'):.4f}")
    print(f"Recall:    {best_row.get('metrics.recall', 'N/A'):.4f}")
    print(f"Run ID:    {best_run_id}")
    print("="*60)
    
    # Utiliser le mod√®le d√©j√† charg√© (best_model du d√©but)
    print(f"\n‚úÖ Mod√®le disponible: {metadata.get('model_name')}")
    print(f"üìÅ Path: processors/models/best_model_final.pkl")
else:
    print("‚ö†Ô∏è Aucun mod√®le trouv√© dans MLflow")


üèÜ MEILLEUR MOD√àLE (ROC-AUC)
Mod√®le:    LightGBM
Stage:     tuned
ROC-AUC:   0.9931
F1-Score:  0.9192
Precision: 0.9477
Recall:    0.8923
Run ID:    576bad6774914fc0a0cd47ca6fbef372

‚úÖ Mod√®le disponible: LightGBM (Tuned)
üìÅ Path: processors/models/best_model_final.pkl


## 9. üîÑ Chargement du Mod√®le depuis MLflow

In [23]:
# Utiliser le mod√®le d√©j√† charg√© depuis processors/models/
print(f"üì• Utilisation du meilleur mod√®le\n")

loaded_model = best_model  # D√©j√† charg√© au d√©but du notebook

print(f"‚úÖ Mod√®le: {metadata.get('model_name')}")
print(f"   Type: {type(loaded_model).__name__}")
print(f"   ROC-AUC: {metadata.get('metrics', {}).get('roc_auc'):.4f}")
print(f"   Source: processors/models/best_model_final.pkl")

# Test de pr√©diction (X_test est un numpy array)
sample = X_test[:5]
predictions = loaded_model.predict(sample)
probas = loaded_model.predict_proba(sample)[:, 1]

print(f"\nüß™ Test de pr√©diction sur 5 √©chantillons:")
print(f"   Pr√©dictions: {predictions}")
print(f"   Probabilit√©s: {probas.round(4)}")

print("\n‚úÖ Mod√®le op√©rationnel!")

# ============================================================================
# R√âSUM√â FINAL
# ============================================================================
print("\n" + "="*80)
print("üéâ MLFLOW TRACKING TERMIN√â AVEC SUCC√àS!")
print("="*80)
print(f"\nüìä Dashboard MLflow: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow")
print(f"üèÜ Meilleur mod√®le: {metadata.get('model_name')}")
print(f"üìà ROC-AUC: {metadata.get('metrics', {}).get('roc_auc'):.4f}")
print(f"üìÅ Mod√®le sauvegard√©: processors/models/best_model_final.pkl")
print(f"\n‚úÖ Pr√™t pour le d√©ploiement (Docker + Jenkins)")

üì• Utilisation du meilleur mod√®le

‚úÖ Mod√®le: LightGBM (Tuned)
   Type: Pipeline
   ROC-AUC: 0.9931
   Source: processors/models/best_model_final.pkl

üß™ Test de pr√©diction sur 5 √©chantillons:
   Pr√©dictions: [0 0 0 0 0]
   Probabilit√©s: [1.100e-03 6.000e-04 1.142e-01 0.000e+00 1.000e-04]

‚úÖ Mod√®le op√©rationnel!

üéâ MLFLOW TRACKING TERMIN√â AVEC SUCC√àS!

üìä Dashboard MLflow: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow
üèÜ Meilleur mod√®le: LightGBM (Tuned)
üìà ROC-AUC: 0.9931
üìÅ Mod√®le sauvegard√©: processors/models/best_model_final.pkl

‚úÖ Pr√™t pour le d√©ploiement (Docker + Jenkins)


In [None]:
# Test du mod√®le charg√©
print("\nüß™ Test du mod√®le charg√©...\n")

# Pr√©dictions
y_pred_loaded = loaded_model.predict(X_test)
y_proba_loaded = loaded_model.predict_proba(X_test)[:, 1]

# M√©triques
test_metrics = calculate_metrics(y_test, y_pred_loaded, y_proba_loaded)

print("üìä Performances du mod√®le charg√©:")
for metric, value in test_metrics.items():
    print(f"   {metric:12s}: {value:.4f}")

# Test sur quelques exemples
print("\nüîç Pr√©dictions sur 5 exemples:")
sample_predictions = loaded_model.predict(X_test[:5])
sample_probas = loaded_model.predict_proba(X_test[:5])[:, 1]

for i in range(5):
    status = "Churn" if sample_predictions[i] == 1 else "Non-Churn"
    print(f"   Sample {i+1}: {status}, Proba={sample_probas[i]:.4f}")

print("\n‚úÖ Mod√®le fonctionne correctement!")
print("\n" + "="*80)
print("üìã R√âCAPITULATIF FINAL")
print("="*80)
print(f"‚úÖ MLflow configur√© et op√©rationnel")
print(f"‚úÖ {len(df_mlflow)} mod√®les track√©s sur DagsHub")
print(f"‚úÖ Meilleur mod√®le test√© et valid√©")
print(f"‚úÖ Dashboard: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow")
print("\nüöÄ Pr√™t pour Jenkins CI/CD et d√©ploiement production!")


üß™ Test du mod√®le charg√©...

üìä Performances du mod√®le charg√©:
   accuracy    : 0.9748
   precision   : 0.9477
   recall      : 0.8923
   f1_score    : 0.9192
   roc_auc     : 0.9931

üîç Pr√©dictions sur 5 exemples:
   Sample 1: Non-Churn, Proba=0.0011
   Sample 2: Non-Churn, Proba=0.0006
   Sample 3: Non-Churn, Proba=0.1142
   Sample 4: Non-Churn, Proba=0.0000
   Sample 5: Non-Churn, Proba=0.0001

‚úÖ Mod√®le fonctionne correctement!

üìã R√âCAPITULATIF FINAL
‚úÖ MLflow configur√© et op√©rationnel
‚úÖ 9 mod√®les track√©s sur DagsHub
‚úÖ Meilleur mod√®le test√© et valid√©
‚úÖ Dashboard: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow

üöÄ Pr√™t pour Jenkins CI/CD et d√©ploiement production!


## 10. üì¶ Enregistrement dans Model Registry (Local)

In [24]:
# Cr√©er un Model Registry local
MODEL_REGISTRY_DIR = Path("model_registry")
MODEL_REGISTRY_DIR.mkdir(exist_ok=True)

def register_model(model, model_name, version="1.0.0", stage="production", metrics=None, run_id=None):
    """
    Enregistre un mod√®le dans le registry local
    """
    import json
    import shutil
    
    # Cr√©er la structure
    model_dir = MODEL_REGISTRY_DIR / model_name.replace(" ", "_")
    model_dir.mkdir(exist_ok=True)
    
    version_dir = model_dir / version
    version_dir.mkdir(exist_ok=True)
    
    # Sauvegarder le mod√®le
    model_path = version_dir / "model.pkl"
    with open(model_path, 'wb') as f:
        pickle.dump(model, f)
    
    # M√©tadonn√©es
    metadata = {
        "model_name": model_name,
        "version": version,
        "stage": stage,
        "registered_at": datetime.now().isoformat(),
        "metrics": metrics or {},
        "run_id": run_id or "N/A"
    }
    
    with open(version_dir / "metadata.json", 'w') as f:
        json.dump(metadata, f, indent=2)
    
    # Lien production
    if stage == "production":
        prod_path = model_dir / "production.pkl"
        shutil.copy(model_path, prod_path)
    
    return str(model_path)

# Enregistrer le meilleur mod√®le
print("üì¶ Enregistrement dans Model Registry...\n")

registry_name = f"Best_Churn_{metadata.get('model_name', 'LightGBM').replace(' ', '_')}"
model_path = register_model(
    model=loaded_model,
    model_name=registry_name,
    version="1.0.0",
    stage="production",
    metrics=test_metrics,
    run_id=run_id
)

print(f"‚úÖ Mod√®le enregistr√© dans le registry")
print(f"   Nom: {registry_name}")
print(f"   Version: 1.0.0")
print(f"   Stage: production")
print(f"   Path: {model_path}")
print(f"\nüéâ MLflow Tracking + Model Registry termin√©s!")

üì¶ Enregistrement dans Model Registry...



NameError: name 'test_metrics' is not defined

In [25]:
# Fonction pour charger depuis le registry
def load_from_registry(model_name, stage="production"):
    """Charge un mod√®le depuis le registry local"""
    import json
    
    model_dir = MODEL_REGISTRY_DIR / model_name.replace(" ", "_")
    model_path = model_dir / f"{stage}.pkl"
    
    with open(model_path, 'rb') as f:
        model = pickle.load(f)
    
    # Charger les m√©tadonn√©es
    versions = [d for d in model_dir.iterdir() if d.is_dir()]
    if versions:
        latest_version = sorted(versions)[-1]
        with open(latest_version / "metadata.json", 'r') as f:
            metadata = json.load(f)
    else:
        metadata = {}
    
    return model, metadata

# Test du chargement
print("\nüîÑ Test de chargement depuis le registry...\n")

loaded_from_registry, registry_metadata = load_from_registry(registry_name, stage="production")

print(f"‚úÖ Mod√®le charg√© depuis le registry")
print(f"   Nom: {registry_metadata.get('model_name', 'N/A')}")
print(f"   Version: {registry_metadata.get('version', 'N/A')}")
print(f"   ROC-AUC: {registry_metadata.get('metrics', {}).get('roc_auc', 0):.4f}")

# Test de pr√©diction
test_pred = loaded_from_registry.predict(X_test[:5])
test_proba = loaded_from_registry.predict_proba(X_test[:5])[:, 1]

print(f"\nüß™ Test de pr√©diction:")
for i in range(5):
    status = "Churn" if test_pred[i] == 1 else "Non-Churn"
    print(f"   Sample {i+1}: {status}, Proba={test_proba[i]:.4f}")

print("\n‚úÖ Le mod√®le fonctionne correctement!")
print("\n" + "="*80)
print("üéâ MLFLOW TRACKING COMPLET TERMIN√â!")
print("="*80)
print(f"‚úÖ Mod√®les track√©s sur DagsHub MLflow")
print(f"‚úÖ Model Registry local cr√©√©")
print(f"‚úÖ Meilleur mod√®le test√© et valid√©")
print(f"\nüìÇ Fichiers g√©n√©r√©s:")
print(f"   ‚Ä¢ model_registry/{registry_name}/")
print(f"   ‚Ä¢ Dashboard: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow")
print(f"\nüöÄ Pr√™t pour Jenkins CI/CD!")



üîÑ Test de chargement depuis le registry...

‚úÖ Mod√®le charg√© depuis le registry
   Nom: Best_Churn_LightGBM_(Tuned)
   Version: 1.0.0
   ROC-AUC: 0.9931

üß™ Test de pr√©diction:
   Sample 1: Non-Churn, Proba=0.0011
   Sample 2: Non-Churn, Proba=0.0006
   Sample 3: Non-Churn, Proba=0.1142
   Sample 4: Non-Churn, Proba=0.0000
   Sample 5: Non-Churn, Proba=0.0001

‚úÖ Le mod√®le fonctionne correctement!

üéâ MLFLOW TRACKING COMPLET TERMIN√â!
‚úÖ Mod√®les track√©s sur DagsHub MLflow
‚úÖ Model Registry local cr√©√©
‚úÖ Meilleur mod√®le test√© et valid√©

üìÇ Fichiers g√©n√©r√©s:
   ‚Ä¢ model_registry/Best_Churn_LightGBM_(Tuned)/
   ‚Ä¢ Dashboard: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow

üöÄ Pr√™t pour Jenkins CI/CD!


## 11. üìä R√©sum√© Final

In [26]:
print("\n" + "="*80)
print("üéâ R√âSUM√â FINAL - MLflow Tracking - Churn Prediction")
print("="*80)

print(f"\nüìä Mod√®les track√©s:")
print(f"   ‚Ä¢ Production: 1 mod√®le (LightGBM)")
print(f"   ‚Ä¢ Tuned:      6 mod√®les")
print(f"   ‚Ä¢ Ensemble:   2 mod√®les (Stacking + Voting)")
print(f"   ‚Ä¢ TOTAL:      9 mod√®les")

print(f"\nüèÜ Meilleur mod√®le:")
print(f"   ‚Ä¢ Nom:       {best_model_name}")
print(f"   ‚Ä¢ Stage:     {best_stage}")
print(f"   ‚Ä¢ ROC-AUC:   {best_roc_auc:.4f}")
print(f"   ‚Ä¢ F1-Score:  {df_mlflow.iloc[0]['metrics.f1_score']:.4f}")

print(f"\nüîó MLflow DagsHub:")
print(f"   ‚Ä¢ Tracking URI: {MLFLOW_TRACKING_URI}")
print(f"   ‚Ä¢ Experiment:   {EXPERIMENT_NAME}")
print(f"   ‚Ä¢ Dashboard:    https://dagshub.com/karrayyessine1/MLOps_Project.mlflow")
print(f"   ‚Ä¢ Runs totales: {len(df_mlflow)}")

print(f"\nüì¶ Model Registry Local:")
print(f"   ‚Ä¢ Nom:     {registry_name}")
print(f"   ‚Ä¢ Version: 1.0.0")
print(f"   ‚Ä¢ Stage:   production")
print(f"   ‚Ä¢ Path:    {MODEL_REGISTRY_DIR / registry_name.replace(' ', '_')}")

print("\n" + "="*80)
print("‚úÖ Pipeline MLflow termin√© avec succ√®s!")
print("="*80)

print("\nüí° Prochaines √©tapes:")
print("   1. ‚úÖ MLflow tracking configur√©")
print("   2. ‚úÖ Mod√®les logg√©s sur DagsHub")
print("   3. ‚úÖ Model Registry local cr√©√©")
print("   4. üöÄ Configurer Jenkins CI/CD")
print("   5. üê≥ D√©ployer avec Docker")
print("   6. üìä Mettre en place le monitoring")

print("\nüéØ Fichiers g√©n√©r√©s:")
print(f"   ‚Ä¢ model_registry/{registry_name}/")
print(f"   ‚Ä¢ processors/models/best_model_final.pkl")
print(f"   ‚Ä¢ processors/model_comparison_final.csv")


üéâ R√âSUM√â FINAL - MLflow Tracking - Churn Prediction

üìä Mod√®les track√©s:
   ‚Ä¢ Production: 1 mod√®le (LightGBM)
   ‚Ä¢ Tuned:      6 mod√®les
   ‚Ä¢ Ensemble:   2 mod√®les (Stacking + Voting)
   ‚Ä¢ TOTAL:      9 mod√®les

üèÜ Meilleur mod√®le:
   ‚Ä¢ Nom:       LightGBM
   ‚Ä¢ Stage:     tuned
   ‚Ä¢ ROC-AUC:   0.9931
   ‚Ä¢ F1-Score:  0.9192

üîó MLflow DagsHub:
   ‚Ä¢ Tracking URI: https://dagshub.com/karrayyessine1/MLOps_Project.mlflow
   ‚Ä¢ Experiment:   churn_prediction
   ‚Ä¢ Dashboard:    https://dagshub.com/karrayyessine1/MLOps_Project.mlflow
   ‚Ä¢ Runs totales: 7

üì¶ Model Registry Local:
   ‚Ä¢ Nom:     Best_Churn_LightGBM_(Tuned)
   ‚Ä¢ Version: 1.0.0
   ‚Ä¢ Stage:   production
   ‚Ä¢ Path:    model_registry\Best_Churn_LightGBM_(Tuned)

‚úÖ Pipeline MLflow termin√© avec succ√®s!

üí° Prochaines √©tapes:
   1. ‚úÖ MLflow tracking configur√©
   2. ‚úÖ Mod√®les logg√©s sur DagsHub
   3. ‚úÖ Model Registry local cr√©√©
   4. üöÄ Configurer Jenkins CI/CD
   5.