
# Titanic + **MLflow Model Registry** + API (v1/v2, Staging/Production)

Este notebook cria um fluxo **completo** usando **MLflow Tracking + Model Registry**:

1. **Setup do MLflow Server** (Tracking + Registry) ‚Äî usando **SQLite** como backend e artefatos locais.
2. Treino de **v1** e **v2** (RandomForest com hiperpar√¢metros diferentes), logging de m√©tricas/artefatos.
3. **Registro** das vers√µes no **Model Registry** como `titanic_rf`:
   - `v1` ‚Üí **Production**
   - `v2` ‚Üí **Staging**
4. **Consumo por est√°gio**: carregar `models:/titanic_rf/Production` e `.../Staging` e comparar previs√µes.
5. **API FastAPI** com sele√ß√£o por **stage** (Production/Staging) e testes com `TestClient`.
6. **Promo√ß√£o de vers√£o** (ex.: mover v2 ‚Üí Production).

> **Pr√©-requisitos** no seu ambiente:  
> `pip install mlflow fastapi uvicorn scikit-learn pandas numpy pydantic`

> **Importante**: Para o **Model Registry** funcionar, voc√™ precisa rodar o **mlflow server** em **outro terminal** (c√©lula abaixo mostra os comandos). O Registry **n√£o** funciona com `file:` tracking URI.


## 0) Instru√ß√µes para iniciar o MLflow Server (em outro terminal)

In [7]:

import os, sys, pathlib, textwrap
from pathlib import Path

BASE = Path(".").resolve()
db_uri = f"sqlite:///{(BASE / 'mlflow.db').as_posix()}"
art_root = (BASE / "mlartifacts").as_posix()
print("=== Comando para rodar em OUTRO terminal ===")
print(textwrap.dedent(f"""
mlflow server \
  --backend-store-uri "{db_uri}" \
  --default-artifact-root "{art_root}" \
  --host 0.0.0.0 --port 5000
"""))
print("\nAcesse a UI em: http://localhost:5000")
print("Este notebook vai usar: MLFLOW_TRACKING_URI = http://localhost:5000")


=== Comando para rodar em OUTRO terminal ===

mlflow server   --backend-store-uri "sqlite:////home/vinicius/git/machine_learning/titanic/mlflow.db"   --default-artifact-root "/home/vinicius/git/machine_learning/titanic/mlartifacts"   --host 0.0.0.0 --port 5000


Acesse a UI em: http://localhost:5000
Este notebook vai usar: MLFLOW_TRACKING_URI = http://localhost:5000


## 1) Imports e configura√ß√£o do Tracking URI

In [8]:

import os, json, numpy as np, pandas as pd
from pathlib import Path

import mlflow
import mlflow.sklearn
from mlflow.tracking import MlflowClient

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score

import warnings
warnings.filterwarnings("ignore")

SEED = 42
np.random.seed(SEED)

DATA_DIR = Path("."); DATA_DIR.mkdir(parents=True, exist_ok=True)
train_path = DATA_DIR / "train.csv"

# IMPORTANT: server deve estar rodando em outro terminal
TRACKING_URI = "http://localhost:5000"
mlflow.set_tracking_uri(TRACKING_URI)
mlflow.set_experiment("titanic_registry_demo")

client = MlflowClient(tracking_uri=TRACKING_URI)

MODEL_NAME = "titanic_rf"


2025/10/27 14:29:06 INFO mlflow.tracking.fluent: Experiment with name 'titanic_registry_demo' does not exist. Creating a new experiment.


## 2) Dados e pipeline

In [9]:

# Carregar dados (coloque train.csv do Kaggle em ./data)
train = pd.read_csv(train_path)

TARGET = "Survived"
FEATURES_NUM = ["Age", "SibSp", "Parch", "Fare"]
FEATURES_CAT = ["Pclass", "Sex", "Embarked"]
FEATURES_ALL = FEATURES_NUM + FEATURES_CAT

X = train[FEATURES_ALL].copy()
y = train[TARGET].values

# Pr√©-processamento
numeric = Pipeline([("imputer", SimpleImputer(strategy="median")), ("scaler", StandardScaler(with_mean=False))])
categorical = Pipeline([("imputer", SimpleImputer(strategy="most_frequent")), ("ohe", OneHotEncoder(handle_unknown="ignore", sparse_output=True))])
pre = ColumnTransformer([("num", numeric, FEATURES_NUM), ("cat", categorical, FEATURES_CAT)])

cv5 = StratifiedKFold(n_splits=5, shuffle=True, random_state=SEED)
def auc_cv(pipe):
    return cross_val_score(pipe, X, y, scoring="roc_auc", cv=cv5, n_jobs=-1).mean()


## 3) Treinar e **logar v1** (baseline)

In [10]:

from sklearn.metrics import roc_auc_score

rf_v1 = RandomForestClassifier(n_estimators=300, random_state=SEED, n_jobs=-1)
pipe_v1 = Pipeline([("prep", pre), ("clf", rf_v1)])

auc_v1 = float(auc_cv(pipe_v1))

with mlflow.start_run(run_name="v1_rf_baseline") as run:
    mlflow.log_params({"model": "RandomForestClassifier", "n_estimators": 300, "seed": SEED})
    mlflow.log_metric("cv_auc", auc_v1)
    pipe_v1.fit(X, y)
    mlflow.sklearn.log_model(pipe_v1, artifact_path="model")
    run_id_v1 = run.info.run_id

print("run_id_v1:", run_id_v1, "| AUC(v1):", round(auc_v1, 4))
v1_uri = f"runs:/{run_id_v1}/model"
v1_uri




üèÉ View run v1_rf_baseline at: http://localhost:5000/#/experiments/1/runs/2056beafd7d34199b809f735ccd1c102
üß™ View experiment at: http://localhost:5000/#/experiments/1
run_id_v1: 2056beafd7d34199b809f735ccd1c102 | AUC(v1): 0.872


'runs:/2056beafd7d34199b809f735ccd1c102/model'

## 4) Treinar e **logar v2** (ajustado)

In [11]:

rf_v2 = RandomForestClassifier(n_estimators=600, max_depth=8, min_samples_split=5, random_state=SEED, n_jobs=-1)
pipe_v2 = Pipeline([("prep", pre), ("clf", rf_v2)])

auc_v2 = float(auc_cv(pipe_v2))

with mlflow.start_run(run_name="v2_rf_tuned") as run:
    mlflow.log_params({"model": "RandomForestClassifier", "n_estimators": 600, "max_depth": 8, "min_samples_split": 5, "seed": SEED})
    mlflow.log_metric("cv_auc", auc_v2)
    pipe_v2.fit(X, y)
    mlflow.sklearn.log_model(pipe_v2, artifact_path="model")
    run_id_v2 = run.info.run_id

print("run_id_v2:", run_id_v2, "| AUC(v2):", round(auc_v2, 4))
v2_uri = f"runs:/{run_id_v2}/model"
v2_uri




üèÉ View run v2_rf_tuned at: http://localhost:5000/#/experiments/1/runs/eba5e4a2d7bb448eb02cdd4779d5432f
üß™ View experiment at: http://localhost:5000/#/experiments/1
run_id_v2: eba5e4a2d7bb448eb02cdd4779d5432f | AUC(v2): 0.8745


'runs:/eba5e4a2d7bb448eb02cdd4779d5432f/model'

## 5) Criar/assegurar registro e **registrar v1/v2** no Model Registry

In [12]:

# Garantir que o Registered Model existe
try:
    client.get_registered_model(MODEL_NAME)
    print("Registered model exists:", MODEL_NAME)
except Exception:
    client.create_registered_model(MODEL_NAME)
    print("Created registered model:", MODEL_NAME)

# Registrar cada run como uma vers√£o do modelo
mv1 = mlflow.register_model(model_uri=v1_uri, name=MODEL_NAME)
mv2 = mlflow.register_model(model_uri=v2_uri, name=MODEL_NAME)
print("ModelVersion v1:", mv1.version, "| source:", mv1.source)
print("ModelVersion v2:", mv2.version, "| source:", mv2.source)


Registered model 'titanic_rf' already exists. Creating a new version of this model...
2025/10/27 14:29:13 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: titanic_rf, version 1
Created version '1' of model 'titanic_rf'.
Registered model 'titanic_rf' already exists. Creating a new version of this model...


Created registered model: titanic_rf


2025/10/27 14:29:13 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: titanic_rf, version 2


ModelVersion v1: 1 | source: models:/m-cf2095cafb314da587f23044183381d0
ModelVersion v2: 2 | source: models:/m-474e85f1740243758c8db2f69d6d93bc


Created version '2' of model 'titanic_rf'.


## 6) **Definir est√°gios** (v1‚ÜíProduction, v2‚ÜíStaging)

In [13]:

# Transicionar est√°gios (note: precisa de permiss√£o no server; no local default √© permitido)
client.transition_model_version_stage(name=MODEL_NAME, version=mv1.version, stage="Production", archive_existing_versions=False)
client.transition_model_version_stage(name=MODEL_NAME, version=mv2.version, stage="Staging", archive_existing_versions=False)

# Listar para confirmar
versions = client.search_model_versions(f"name='{MODEL_NAME}'")
[(v.version, v.current_stage, v.run_id) for v in versions]


[('2', 'Staging', 'eba5e4a2d7bb448eb02cdd4779d5432f'),
 ('1', 'Production', '2056beafd7d34199b809f735ccd1c102')]

## 7) **Carregar por est√°gio** e comparar previs√µes

In [14]:

import mlflow.pyfunc

sample = pd.DataFrame([
    {"Age": 22, "SibSp": 1, "Parch": 0, "Fare": 7.25, "Pclass": 3, "Sex": "male", "Embarked": "S"},
    {"Age": 38, "SibSp": 1, "Parch": 0, "Fare": 71.2833, "Pclass": 1, "Sex": "female", "Embarked": "C"},
])

m_prod = mlflow.pyfunc.load_model(model_uri=f"models:/{MODEL_NAME}/Production")
m_stag = mlflow.pyfunc.load_model(model_uri=f"models:/{MODEL_NAME}/Staging")

# Ambos modelos foram logados como sklearn pipeline -> predict_proba dispon√≠vel via predict se wrapper exp√µe; para seguran√ßa:
try:
    p_prod = m_prod.predict(sample)  # pode retornar proba ou classe dependendo do flavor; sklearn geralmente tem predict
except Exception:
    # fallback: carregar como sklearn diretamente
    m_prod = mlflow.sklearn.load_model(model_uri=f"models:/{MODEL_NAME}/Production")
    p_prod = m_prod.predict_proba(sample)[:,1]

try:
    p_stg = m_stag.predict(sample)
except Exception:
    m_stag = mlflow.sklearn.load_model(model_uri=f"models:/{MODEL_NAME}/Staging")
    p_stg = m_stag.predict_proba(sample)[:,1]

{"proba_production": p_prod.tolist(), "proba_staging": p_stg.tolist()}


{'proba_production': [0, 1], 'proba_staging': [0, 1]}

## 8) API (FastAPI) consultando por **stage** (Production/Staging)

In [15]:

from fastapi import FastAPI
from pydantic import BaseModel
from typing import List, Optional
from fastapi.testclient import TestClient

class Passenger(BaseModel):
    Age: Optional[float] = None
    SibSp: Optional[int] = None
    Parch: Optional[int] = None
    Fare: Optional[float] = None
    Pclass: Optional[int] = None
    Sex: Optional[str] = None
    Embarked: Optional[str] = None

class PredictRequest(BaseModel):
    stage: str  # "Production" ou "Staging"
    inputs: List[Passenger]

class PredictResponse(BaseModel):
    stage: str
    probabilities: List[float]

app = FastAPI(title="Titanic Registry API", version="1.0.0")

@app.get("/health")
def health():
    # verifica se consegue listar vers√µes
    try:
        vers = client.search_model_versions(f"name='{MODEL_NAME}'")
        ok = True
        details = [(v.version, v.current_stage) for v in vers]
    except Exception as e:
        ok = False
        details = str(e)
    return {"ok": ok, "model": MODEL_NAME, "versions": details}

def _load_by_stage(stage: str):
    # Primeiro tenta carregar como sklearn (para garantir predict_proba)
    try:
        model = mlflow.sklearn.load_model(model_uri=f"models:/{MODEL_NAME}/{stage}")
        return ("sklearn", model)
    except Exception:
        # fallback: pyfunc e usar predict
        model = mlflow.pyfunc.load_model(model_uri=f"models:/{MODEL_NAME}/{stage}")
        return ("pyfunc", model)

@app.post("/predict", response_model=PredictResponse)
def predict(req: PredictRequest):
    if req.stage not in {"Production", "Staging"}:
        raise ValueError("stage must be 'Production' or 'Staging'")
    kind, model = _load_by_stage(req.stage)
    df = pd.DataFrame([x.model_dump() for x in req.inputs])
    if kind == "sklearn" and hasattr(model, "predict_proba"):
        proba = model.predict_proba(df)[:,1].tolist()
    else:
        out = model.predict(df)
        # Se vier classes, adapte; aqui tentamos converter para list of floats
        proba = [float(x) if not isinstance(x, (list, tuple, np.ndarray)) else float(x[0]) for x in np.atleast_1d(out)]
    return PredictResponse(stage=req.stage, probabilities=proba)

client_api = TestClient(app)

# Testes locais
payload = {
    "stage":"Production",
    "inputs":[
        {"Age":22,"SibSp":1,"Parch":0,"Fare":7.25,"Pclass":3,"Sex":"male","Embarked":"S"},
        {"Age":38,"SibSp":1,"Parch":0,"Fare":71.2833,"Pclass":1,"Sex":"female","Embarked":"C"}
    ]
}
r_prod = client_api.post("/predict", json=payload).json()
r_stg  = client_api.post("/predict", json={**payload, "stage":"Staging"}).json()
{"production": r_prod, "staging": r_stg}


{'production': {'stage': 'Production',
  'probabilities': [0.08611111111111112, 1.0]},
 'staging': {'stage': 'Staging',
  'probabilities': [0.11172925081682045, 0.9972483013208947]}}

## 9) Promo√ß√£o de vers√£o (ex.: v2 ‚Üí Production)

In [19]:

# Exemplo: promover v2 para Production (e mover v1 para Archived)
# client.transition_model_version_stage(name=MODEL_NAME, version=mv2.version, stage="Production", archive_existing_versions=True)

# Mostra vers√µes e est√°gios atuais
versions = client.search_model_versions(f"name='{MODEL_NAME}'")
[(v.version, v.current_stage) for v in versions]


[('2', 'Staging'), ('1', 'Production')]

## 10) Como rodar a API fora do notebook

In [17]:

print("""
# Em um terminal, com o MLflow Server rodando (porta 5000):

uvicorn <este_notebook_ou_app>:app --host 0.0.0.0 --port 8000

# Requisi√ß√£o:
curl -X POST http://localhost:8000/predict -H "Content-Type: application/json" -d '{
  "stage": "Production",
  "inputs": [
    {"Age": 22, "SibSp": 1, "Parch": 0, "Fare": 7.25, "Pclass": 3, "Sex": "male", "Embarked": "S"}
  ]
}'
""")



# Em um terminal, com o MLflow Server rodando (porta 5000):

uvicorn <este_notebook_ou_app>:app --host 0.0.0.0 --port 8000

# Requisi√ß√£o:
curl -X POST http://localhost:8000/predict -H "Content-Type: application/json" -d '{
  "stage": "Production",
  "inputs": [
    {"Age": 22, "SibSp": 1, "Parch": 0, "Fare": 7.25, "Pclass": 3, "Sex": "male", "Embarked": "S"}
  ]
}'

