# 04 - Give Me Some Credit (Kaggle) - E2E pipeline example

This notebook demonstrates: data ingestion, basic EDA, preprocessing with the pipeline, and running a quick training run logged to MLflow.

Dataset: 'Give Me Some Credit' (Kaggle) â€” replace `data/dataset.csv` with the downloaded CSV or use Kaggle API to fetch.


In [None]:
import os
import pandas as pd
from mlflow_pipeline.preprocess import pandas_preprocess
from mlflow_pipeline.train import train_and_log
from mlflow_pipeline.mlflow_utils import set_tracking
import mlflow

# Configure
DATA_PATH = 'data/dataset.csv'  # put the Give Me Some Credit dataset here
TARGET = 'SeriousDlqin2yrs'     # Kaggle label for this dataset

set_tracking('http://localhost:5000')
mlflow.set_experiment('GiveMeCredit-EDA')

df = pd.read_csv(DATA_PATH)
df.head()


In [None]:
# Basic EDA
df.shape


In [None]:
# Preprocess and run a quick model (xgboost) and log result
X_train, X_test, y_train, y_test = pandas_preprocess(df, target_col=TARGET)

run_id, auc = train_and_log(X_train, X_test, y_train, y_test, model_type='xgboost', params={"n_estimators":100})
print('Run id:', run_id, 'AUC:', auc)


Next steps:
- Run hyperparameter search via `mlflow_pipeline/hyperparam_search.py`.
- Register the best model in MLflow Model Registry and promote to Production.
- Build Docker image and deploy to Kubernetes as per `.github/workflows/ci.yml` and `k8s.yaml`.
