# MLflow Inline Demo — Executable Notebook (No external .py files)

This notebook demonstrates **MLflow with autologging** entirely *inline*: every step — data creation, model training, MLflow logging — runs inside notebook cells. No external `.py` files are created. Ideal for live teaching where learners run cells one-by-one.

**How to use:**
- Activate your virtualenv with `mlflow` installed before opening the notebook.
- Start MLflow UI in a separate terminal to visualize runs: `mlflow ui --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns`
- Then run cells sequentially.


## 1) Environment check
Run this cell to confirm mlflow and sklearn are available in your environment.

In [None]:
import sys
import mlflow
import sklearn
import pandas as pd
print('python', sys.version.split('\n')[0])
print('mlflow', mlflow.__version__)
print('sklearn', sklearn.__version__)
print('pandas', pd.__version__)

## 2) Create dataset (Iris) inline
This creates `data/raw/iris.csv` and also keeps the dataframe in a variable for notebook use.

In [None]:
from sklearn.datasets import load_iris
import pandas as pd, os
os.makedirs('data/raw', exist_ok=True)
data = load_iris()
df = pd.DataFrame(data=data['data'], columns=['sepal_length','sepal_width','petal_length','petal_width'])
df['target'] = data['target']
df.to_csv('data/raw/iris.csv', index=False)
print('Wrote data/raw/iris.csv ->', len(df), 'rows')
df.head()

## 3) Define parameters (as a python dict) — toggle `autolog` here
You can edit values in this cell and re-run training cell below to create new experiments/runs.

In [None]:
params = {
    'data': {'raw_path': 'data/raw/iris.csv'},
    'train': {
        'test_size': 0.2,
        'random_state': 42,
        'max_iter': 200,
        'C': 1.0,
        'solver': 'lbfgs',
        'experiment_name': 'mlops_demo_experiment',
        'autolog': True
    }
}

params

## 4) Training cell (inline) — uses MLflow autolog when `params['train']['autolog']` is True
This cell trains a LogisticRegression, logs via MLflow (autolog + manual metrics), and prints the run id. Run it multiple times after changing `params` to create multiple runs.

In [None]:
import mlflow
import mlflow.sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score
import joblib
from datetime import datetime, timezone

# Use current params dict
cfg = params
train_cfg = cfg['train']

# Optional: set tracking URI if you are using mlflow server
# mlflow.set_tracking_uri('http://127.0.0.1:5000')

# Enable autolog if requested
if train_cfg.get('autolog', False):
    mlflow.sklearn.autolog()
    print('Autolog ENABLED')
else:
    print('Autolog DISABLED')

# Read data
import pandas as pd
df = pd.read_csv(cfg['data']['raw_path'])
X = df.drop('target', axis=1)
y = df['target']

# Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=train_cfg['test_size'], random_state=train_cfg['random_state']
)

# Create experiment
mlflow.set_experiment(train_cfg.get('experiment_name', 'default'))

run_name = f"notebook_run_{datetime.now(timezone.utc).strftime('%Y%m%dT%H%M%SZ')}"
with mlflow.start_run(run_name=run_name) as run:
    run_id = run.info.run_id
    # manual param logging only if autolog is off
    if not train_cfg.get('autolog', False):
        mlflow.log_params({k: v for k, v in train_cfg.items() if k != 'experiment_name'})

    model = LogisticRegression(max_iter=train_cfg['max_iter'], C=train_cfg['C'], solver=train_cfg.get('solver','lbfgs'))
    model.fit(X_train, y_train)

    preds = model.predict(X_test)
    acc = float(accuracy_score(y_test, preds))
    f1 = float(f1_score(y_test, preds, average='macro'))

    mlflow.log_metric('accuracy_custom_eval', acc)
    mlflow.log_metric('f1_macro_custom_eval', f1)

    # save model artifact locally
    import os
    os.makedirs('models', exist_ok=True)
    model_path = f"models/logreg_{run_id}.joblib"
    joblib.dump(model, model_path)
    mlflow.log_artifact(model_path, artifact_path='model')

    # ensure skmodel logged (autolog may have done it)
    mlflow.sklearn.log_model(model, artifact_path='skmodel')

    print(f'Run {run_id} finished — accuracy={acc:.4f}, f1={f1:.4f}')

run_id

## 5) Inspect recent runs programmatically
This cell lists recent runs and their metrics for the experiment defined in `params`.

In [None]:
import mlflow, pandas as pd
exp = mlflow.get_experiment_by_name(params['train']['experiment_name'])
if exp is None:
    print('Experiment not found yet')
else:
    df_runs = mlflow.search_runs([exp.experiment_id], order_by=['start_time DESC'], max_results=20)
    if df_runs.empty:
        print('No runs found')
    else:
        display(df_runs[['run_id','status','metrics.accuracy_custom_eval','metrics.f1_macro_custom_eval','params.C','start_time']])

## 6) Load the latest logged model and run a prediction
This cell auto-detects the latest finished run and loads its `skmodel` artifact via MLflow, then predicts a sample.

In [None]:
import mlflow.sklearn
exp = mlflow.get_experiment_by_name(params['train']['experiment_name'])
if exp is None:
    print('Experiment not found')
else:
    df_runs = mlflow.search_runs([exp.experiment_id], order_by=['start_time DESC'], max_results=10)
    if df_runs.empty:
        print('No runs available')
    else:
        # find first finished run
        done = df_runs[df_runs['status']=='FINISHED']
        if done.empty:
            print('No finished runs yet; wait or end active run')
        else:
            latest_run_id = done.iloc[0]['run_id']
            model_uri = f"runs:/{latest_run_id}/skmodel"
            print('Loading model from', model_uri)
            model = mlflow.sklearn.load_model(model_uri)
            print('Prediction for sample:', model.predict([[5.1,3.5,1.4,0.2]]))

## 7) Tips
- To view the UI while running the notebook, start the MLflow UI in another terminal: `mlflow ui --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns` and open http://127.0.0.1:5000
- Toggle `params['train']['autolog']` to see autolog ON vs OFF, then re-run the training cell to create new runs.
- If a run appears RUNNING, call `mlflow.end_run()` in a cell to finish it.