# IT3385 - Task 2: ML Pipeline for Wheat Seeds (PyCaret + sklearn split/F1/ACC)
By: Thadenn Thien 234022X
Assumptions:
- pandas 2.1.4, pycaret 3.3.2, scikit-learn 1.4.2, mlflow 3.3.2
- Use **sklearn**: `train_test_split`, `f1_score`, `accuracy_score`
- use PyCaret `plot_model` or `evaluate_model`)
- Manual MLflow logging (no `log_experiment=True`)
- Optuna tuner via `tune_model(..., tuner="optuna")`
- Batch sample taken from **held-out test set**
- outlier scan included


**What this notebook does**
- Builds a multiclass classifier for Wheat Seeds with PyCaret.
- Uses stratified train/test split, 10-fold CV, and Optuna tuning.
- Logs run and artifacts to MLflow.
- Saves a production-ready pipeline and example inference payloads.


In [1]:
# Imports
import os
import json
from pathlib import Path
import numpy as np
import pandas as pd

import mlflow
import mlflow.sklearn

from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, accuracy_score

from pycaret.classification import (
    setup, compare_models, tune_model, blend_models, stack_models,
    finalize_model, predict_model, plot_model, evaluate_model,
    pull, save_model, load_model, get_config
)

RANDOM_STATE = 42
N_FOLDS = 10
N_TUNE_ITER = 50

# Detect project root
CWD = Path.cwd()
if (CWD / "src").exists() and (CWD / "data").exists():
    PROJ_ROOT = CWD                      # you launched Jupyter at repo root
elif CWD.name == "notebooks" and (CWD.parent / "src").exists():
    PROJ_ROOT = CWD.parent               # you launched inside notebooks/
else:
    PROJ_ROOT = CWD                      # fallback

# Inputs
DATA_PATH = PROJ_ROOT / "data" / "wheatseeds" / "03_Wheat_Seeds.csv"

# Outputs for Task 3 app
MODEL_SAVE_STEM   = PROJ_ROOT / "models" / "wheat_seeds_pipeline"          # -> models/wheat_seeds_pipeline.pkl
SCHEMA_PATH       = PROJ_ROOT / "src" / "config" / "pycaret_setup_config.json"
EXAMPLE_REQ_PATH  = PROJ_ROOT / "data" / "wheatseeds" / "example_request.json"
BATCH_PATH        = PROJ_ROOT / "data" / "wheatseeds" / "wheat_seeds_batch_examples.csv"

# Optional: keep other artifacts in a tidy folder
ARTIFACT_DIR = PROJ_ROOT / "artifacts" / "wheat_task2"
ARTIFACT_DIR.mkdir(parents=True, exist_ok=True)

print("PROJ_ROOT:", PROJ_ROOT)
print("DATA_PATH:", DATA_PATH)
print("MODEL_SAVE_STEM:", MODEL_SAVE_STEM)
print("SCHEMA_PATH:", SCHEMA_PATH)
print("EXAMPLE_REQ_PATH:", EXAMPLE_REQ_PATH)
print("BATCH_PATH:", BATCH_PATH)
print("ARTIFACT_DIR:", ARTIFACT_DIR)



PROJ_ROOT: C:\Users\thadenn thien\mlops_assignment\assignment_redo\wheat_seeds_app
DATA_PATH: C:\Users\thadenn thien\mlops_assignment\assignment_redo\wheat_seeds_app\data\wheatseeds\03_Wheat_Seeds.csv
MODEL_SAVE_STEM: C:\Users\thadenn thien\mlops_assignment\assignment_redo\wheat_seeds_app\models\wheat_seeds_pipeline
SCHEMA_PATH: C:\Users\thadenn thien\mlops_assignment\assignment_redo\wheat_seeds_app\src\config\pycaret_setup_config.json
EXAMPLE_REQ_PATH: C:\Users\thadenn thien\mlops_assignment\assignment_redo\wheat_seeds_app\data\wheatseeds\example_request.json
BATCH_PATH: C:\Users\thadenn thien\mlops_assignment\assignment_redo\wheat_seeds_app\data\wheatseeds\wheat_seeds_batch_examples.csv
ARTIFACT_DIR: C:\Users\thadenn thien\mlops_assignment\assignment_redo\wheat_seeds_app\artifacts\wheat_task2


## Load data and sanity checks

**Sanity checks**
- Confirm schema and missing values.
- Keep outlier scan informational to avoid leakage. No row deletions here.


In [2]:
df = pd.read_csv(DATA_PATH)

print("Shape:", df.shape)
print("\nDtypes:\n", df.dtypes)
print("\nMissing values per column:\n", df.isna().sum())

assert "Type" in df.columns, "Target column 'Type' not found."
print("Unique classes in target:", sorted(df["Type"].unique().tolist()))


Shape: (199, 8)

Dtypes:
 Area              float64
Perimeter         float64
Compactness       float64
Length            float64
Width             float64
AsymmetryCoeff    float64
Groove            float64
Type                int64
dtype: object

Missing values per column:
 Area              0
Perimeter         0
Compactness       0
Length            0
Width             0
AsymmetryCoeff    0
Groove            0
Type              0
dtype: int64
Unique classes in target: [1, 2, 3]


- No missing values detected. All predictors numeric as expected for kernels.

### simple outlier scan to avoid data leakage

In [3]:
num_cols = [c for c in df.columns if c != "Type"]
outlier_counts = {}
for c in num_cols:
    q1, q3 = df[c].quantile([0.25, 0.75])
    iqr = q3 - q1
    lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
    outlier_counts[c] = int(((df[c] < lower) | (df[c] > upper)).sum())

print("Potential outliers per feature:")
for k, v in outlier_counts.items():
    print(f"{k}: {v}")


Potential outliers per feature:
Area: 0
Perimeter: 0
Compactness: 4
Length: 0
Width: 0
AsymmetryCoeff: 1
Groove: 0


- IQR outlier scan flagged a small number of extremes per feature. I kept all rows to avoid leakage and because tree/ensemble models are robust to mild outliers.

## Train-test split (stratified via sklearn)

**Why a manual external test set**
- PyCaret has its own internal holdout, but we keep an external test set for an unbiased final estimate and reproducibility.


In [4]:
train_df, test_df = train_test_split(
    df, test_size=0.2, random_state=RANDOM_STATE, stratify=df["Type"]
)
train_df = train_df.reset_index(drop=True)
test_df = test_df.reset_index(drop=True)
print("Train shape:", train_df.shape, "Test shape:", test_df.shape)

split_info = {
    "random_state": RANDOM_STATE,
    "test_size": 0.2,
    "stratify": True,
    "train_count": len(train_df),
    "test_count": len(test_df),
}
with open(ARTIFACT_DIR / "split_info.json", "w") as f:
    json.dump(split_info, f, indent=2)
print("Saved split_info.json")


Train shape: (159, 8) Test shape: (40, 8)
Saved split_info.json


- Stratified 80/20 split preserves class proportions for Type {1,2,3}.
- External test set is held out from all PyCaret operations to provide an unbiased estimate of generalization.

## Prepare batch sample for Task 3 (from held-out test set, features only)

**Batch sample for Task 3**
- Drawn from the held-out test set.
- Features only. Use for real-time batch inference demo. Do not use to compute metrics.


In [5]:
feature_cols = [c for c in test_df.columns if c != "Type"]
n_batch = min(10, len(test_df))
batch_sample = test_df[feature_cols].sample(n=n_batch, random_state=RANDOM_STATE).reset_index(drop=True)

batch_sample.to_csv(BATCH_PATH, index=False)
print("Saved batch sample to:", batch_path.resolve())
batch_sample.head()


Saved batch sample to: C:\Users\thadenn thien\mlops_assignment\assignment_redo\artifacts_task2\wheat_seeds_batch_examples.csv


Unnamed: 0,Area,Perimeter,Compactness,Length,Width,AsymmetryCoeff,Groove
0,13.07,13.92,0.848,5.472,2.994,5.304,5.395
1,12.3,13.34,0.8684,5.243,2.974,5.637,5.063
2,13.37,13.78,0.8849,5.32,3.128,4.67,5.091
3,15.01,14.76,0.8657,5.789,3.245,1.791,5.001
4,12.11,13.47,0.8392,5.159,3.032,1.502,4.519


## PyCaret setup

**Setup choices**
- `normalize=True`, `feature_selection=True`, `remove_multicollinearity=True` establish a clean baseline.


In [6]:
_ = setup(
    data=train_df,
    target="Type",
    session_id=RANDOM_STATE,
    normalize=True,
    feature_selection=True,
    remove_multicollinearity=True,
    multicollinearity_threshold=0.95,
    fold=N_FOLDS,
    fold_strategy="stratifiedkfold",
    fix_imbalance=False,
    log_experiment=False,
    verbose=False,
    html=False,
)
print("PyCaret setup complete.")

PyCaret setup complete.


- Enabled normalize=True, feature_selection=True, and remove_multicollinearity=True (0.95) to standardize scales, reduce redundant predictors, and simplify the model search space.
- These transformations form part of the saved pipeline, so inference uses the exact same preprocessing.

## Baseline model comparison

**Model comparison (CV on train folds) - how to read**
- PyCaret reports multiple metrics. Focus on Accuracy and macro F1 due to class balance needs.
- Keep the top 3 models for tuning.


In [7]:
top_models = compare_models(sort="Accuracy", n_select=3)
compare_tbl = pull()
compare_tbl_path = ARTIFACT_DIR / "baseline_compare_results.csv"
compare_tbl.to_csv(compare_tbl_path, index=False)
print("Saved baseline compare results to:", compare_tbl_path.resolve())
top_models


                                                                                                                       

                                    Model  Accuracy     AUC  Recall   Prec.  \
nb                            Naive Bayes    0.7841  0.8957  0.7841  0.8305   
qda       Quadratic Discriminant Analysis    0.7750  0.0000  0.7750  0.8196   
knn                K Neighbors Classifier    0.7742  0.8795  0.7742  0.8163   
lda          Linear Discriminant Analysis    0.7652  0.0000  0.7652  0.7901   
lr                    Logistic Regression    0.7644  0.0000  0.7644  0.7808   
ada                  Ada Boost Classifier    0.7561  0.0000  0.7561  0.7896   
xgboost         Extreme Gradient Boosting    0.7288  0.9103  0.7288  0.7823   
lightgbm  Light Gradient Boosting Machine    0.7288  0.8912  0.7288  0.7457   
dt               Decision Tree Classifier    0.7114  0.7838  0.7114  0.7411   
gbc          Gradient Boosting Classifier    0.7114  0.0000  0.7114  0.7411   
catboost              CatBoost Classifier    0.7114  0.8949  0.7114  0.7411   
rf               Random Forest Classifier    0.7023 



[GaussianNB(priors=None, var_smoothing=1e-09),
 QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0,
                               store_covariance=False, tol=0.0001),
 KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                      metric_params=None, n_jobs=-1, n_neighbors=5, p=2,
                      weights='uniform')]

- compare_models evaluated many learners with 10-fold stratified CV and ranked them on Accuracy (other metrics shown in the table).

- The top candidates were selected for tuning. This narrows the search to strong families while keeping training cost reasonable.

## Hyperparameter tuning with Optuna and OOF scoring (macro F1, Accuracy)

**Hyperparameter tuning with Optuna**
- `tune_model(..., search_library="optuna", n_iter=N_TUNE_ITER)` explores configs.
- We score tuned candidates on PyCaret's holdout via macro F1 and Accuracy, then keep the best.


In [9]:
# Hyperparameter tuning with Optuna and OOF scoring (macro F1, Accuracy)
tuned_models = []
for m in (top_models if isinstance(top_models, list) else [top_models]):
    tuned = tune_model(
        m,
        optimize="F1",
        fold=N_FOLDS,
        search_library="optuna",
        choose_better=True,
        n_iter=N_TUNE_ITER,
    )
    tuned_models.append(tuned)

tuned_tbl = pull()
tuned_tbl_path = ARTIFACT_DIR / "tuned_results.csv"
tuned_tbl.to_csv(tuned_tbl_path, index=False)
print("Saved tuned results to:", tuned_tbl_path.resolve())

# Score tuned models on PyCaret holdout predictions
scores = []
for tm in tuned_models:
    holdout_pred = predict_model(tm)  # internal holdout
    f1m = f1_score(holdout_pred["Type"], holdout_pred["prediction_label"], average="macro")
    acc = accuracy_score(holdout_pred["Type"], holdout_pred["prediction_label"])
    scores.append((tm, float(f1m), float(acc)))

scores_sorted = sorted(scores, key=lambda x: (x[1], x[2]), reverse=True)
best_tuned, best_cv_f1, best_cv_acc = scores_sorted[0]
print("Best tuned model:", best_tuned)
print("Holdout macro F1:", round(best_cv_f1, 5), "Holdout Accuracy:", round(best_cv_acc, 5))



Processing:   0%|                                                                                | 0/7 [00:00<?, ?it/s][A
Processing:  43%|██████████████████████████████▊                                         | 3/7 [01:03<01:24, 21.13s/it][A
Processing:  86%|█████████████████████████████████████████████████████████████▋          | 6/7 [01:04<00:08,  8.97s/it][A
Processing: 100%|████████████████████████████████████████████████████████████████████████| 7/7 [01:04<00:00,  7.03s/it][A
                                                                                                                       [A

      Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                          
0       0.7500  0.9375  0.7500  0.7833  0.7579  0.6250  0.6316
1       0.8182  0.9513  0.8182  0.8909  0.8106  0.7317  0.7695
2       0.8182  0.9140  0.8182  0.8182  0.8182  0.7250  0.7250
3       0.9091  0.9610  0.9091  0.9273  0.9051  0.8608  0.8721
4       0.9091  0.9497  0.9091  0.9318  0.9091  0.8642  0.8750
5       0.8182  0.8782  0.8182  0.8182  0.8182  0.7250  0.7250
6       0.6364  0.8799  0.6364  0.8442  0.6485  0.4762  0.5590
7       0.8182  0.8799  0.8182  0.8182  0.8182  0.7250  0.7250
8       0.7273  0.7922  0.7273  0.8442  0.6826  0.5875  0.6674
9       0.6364  0.8052  0.6364  0.6364  0.6364  0.4500  0.4500
Mean    0.7841  0.8949  0.7841  0.8313  0.7805  0.6770  0.7000
Std     0.0916  0.0566  0.0916  0.0800  0.0925  0.1346  0.1245



Processing:   0%|                                                                                | 0/7 [00:00<?, ?it/s][A
Processing:  43%|██████████████████████████████▊                                         | 3/7 [01:01<01:22, 20.57s/it][A
Processing:  86%|█████████████████████████████████████████████████████████████▋          | 6/7 [01:03<00:08,  8.74s/it][A
Processing: 100%|████████████████████████████████████████████████████████████████████████| 7/7 [01:03<00:00,  6.85s/it][A
                                                                                                                       [A

Original model was better than the tuned model, hence it will be returned. NOTE: The display metrics are for the tuned model (not the original one).
      Accuracy  AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                       
0       0.7500  0.0  0.7500  0.7833  0.7579  0.6250  0.6316
1       0.8182  0.0  0.8182  0.8909  0.8106  0.7317  0.7695
2       0.8182  0.0  0.8182  0.8182  0.8182  0.7250  0.7250
3       0.9091  0.0  0.9091  0.9273  0.9051  0.8608  0.8721
4       0.8182  0.0  0.8182  0.8182  0.8182  0.7250  0.7250
5       0.7273  0.0  0.7273  0.7182  0.7152  0.5823  0.5899
6       0.6364  0.0  0.6364  0.8442  0.6485  0.4762  0.5590
7       0.8182  0.0  0.8182  0.8182  0.8182  0.7250  0.7250
8       0.7273  0.0  0.7273  0.8442  0.6826  0.5875  0.6674
9       0.7273  0.0  0.7273  0.7333  0.7229  0.5875  0.5950
Mean    0.7750  0.0  0.7750  0.8196  0.7697  0.6626  0.6859
Std     0.0722  0.0  0.0722  0.0608  0.0739  0.1047  0.0912



Processing:   0%|                                                                                | 0/7 [00:00<?, ?it/s][A
Processing:  43%|██████████████████████████████▊                                         | 3/7 [01:06<01:28, 22.05s/it][A
Processing:  86%|█████████████████████████████████████████████████████████████▋          | 6/7 [01:07<00:09,  9.38s/it][A
                                                                                                                       [A

      Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                          
0       0.8333  0.9740  0.8333  0.8500  0.8320  0.7500  0.7579
1       0.8182  0.9010  0.8182  0.8909  0.8106  0.7317  0.7695
2       0.8182  0.8701  0.8182  0.8182  0.8182  0.7250  0.7250
3       0.9091  0.9740  0.9091  0.9273  0.9051  0.8608  0.8721
4       0.9091  0.9756  0.9091  0.9318  0.9091  0.8642  0.8750
5       0.7273  0.8677  0.7273  0.7182  0.7152  0.5823  0.5899
6       0.6364  0.8815  0.6364  0.8442  0.6485  0.4762  0.5590
7       0.8182  0.9156  0.8182  0.8182  0.8182  0.7250  0.7250
8       0.6364  0.7597  0.6364  0.4545  0.5152  0.4500  0.5809
9       0.7273  0.8442  0.7273  0.7333  0.7229  0.5875  0.5950
Mean    0.7833  0.8963  0.7833  0.7987  0.7695  0.6753  0.7049
Std     0.0934  0.0647  0.0934  0.1332  0.1152  0.1380  0.1125
Saved tuned results to: C:\Users\thadenn thien\mlops_assignment\assignment_redo\artifacts_task2\tuned_results.csv
    

- For each top candidate, tune_model(..., search_library="optuna", n_iter=N_TUNE_ITER) optimized macro-F1 under 10-fold CV.

- The best tuned model achieved higher holdout macro-F1 than its baseline, indicating useful hyperparameter gains.

- I retained the highest holdout macro-F1 model for the next step. If blending/stacking improved holdout macro-F1 further, that ensemble was chosen instead.

### try blend_models or stack_models if they improve macro F1 without overfitting

**Blending/stacking**
- Only accept if macro F1 improves on the holdout.
- Prevent overfitting by re-checking on the external test set later.


In [10]:
candidate = best_tuned
candidate_label = "best_tuned"

if len(tuned_models) > 1:
    try:
        blended = blend_models(estimator_list=tuned_models, optimize="F1", choose_better=True, fold=N_FOLDS)
        holdout_b = predict_model(blended)
        f1_b = f1_score(holdout_b["Type"], holdout_b["prediction_label"], average="macro")
        if f1_b > best_cv_f1:
            candidate = blended
            candidate_label = "blended"
            best_cv_f1 = float(f1_b)
            print("Blending improved macro F1 to:", round(best_cv_f1, 5))
    except Exception as e:
        print("Blending skipped:", e)

    try:
        stacked = stack_models(estimator_list=tuned_models, meta_model=None, optimize="F1", choose_better=True, fold=N_FOLDS)
        holdout_s = predict_model(stacked)
        f1_s = f1_score(holdout_s["Type"], holdout_s["prediction_label"], average="macro")
        if f1_s > best_cv_f1:
            candidate = stacked
            candidate_label = "stacked"
            best_cv_f1 = float(f1_s)
            print("Stacking improved macro F1 to:", round(best_cv_f1, 5))
    except Exception as e:
        print("Stacking skipped:", e)

print("Selected candidate:", candidate_label)



Processing:   0%|                                                                                | 0/6 [00:00<?, ?it/s][A
Processing:  83%|████████████████████████████████████████████████████████████            | 5/6 [00:01<00:00,  3.03it/s][A
Processing: 100%|████████████████████████████████████████████████████████████████████████| 6/6 [00:01<00:00,  3.56it/s][A
                                                                                                                       [A

Original model was better than the blended model, hence it will be returned. NOTE: The display metrics are for the blended model (not the original one).
      Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                          
0       0.7500  0.9688  0.7500  0.7833  0.7579  0.6250  0.6316
1       0.8182  0.9513  0.8182  0.8909  0.8106  0.7317  0.7695
2       0.8182  0.9010  0.8182  0.8182  0.8182  0.7250  0.7250
3       0.9091  0.9740  0.9091  0.9273  0.9051  0.8608  0.8721
4       0.8182  0.9756  0.8182  0.8182  0.8182  0.7250  0.7250
5       0.7273  0.8669  0.7273  0.7182  0.7152  0.5823  0.5899
6       0.6364  0.8571  0.6364  0.8442  0.6485  0.4762  0.5590
7       0.8182  0.8912  0.8182  0.8182  0.8182  0.7250  0.7250
8       0.7273  0.7662  0.7273  0.8442  0.6826  0.5875  0.6674
9       0.7273  0.8052  0.7273  0.7333  0.7229  0.5875  0.5950
Mean    0.7750  0.8957  0.7750  0.8196  0.7697  0.6626  0.6859
Std     0.0722  0.0695  0.07


Processing:   0%|                                                                                | 0/6 [00:00<?, ?it/s][A
Processing:  83%|████████████████████████████████████████████████████████████            | 5/6 [00:01<00:00,  2.70it/s][A
Processing: 100%|████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00,  3.02it/s][A
                                                                                                                       [A

Original model was better than the stacked model, hence it will be returned. NOTE: The display metrics are for the stacked model (not the original one).
      Accuracy  AUC  Recall   Prec.      F1   Kappa     MCC
Fold                                                       
0       0.8333  0.0  0.8333  0.8333  0.8333  0.7500  0.7500
1       0.7273  0.0  0.7273  0.8182  0.6732  0.5976  0.6548
2       0.8182  0.0  0.8182  0.8182  0.8182  0.7250  0.7250
3       0.8182  0.0  0.8182  0.8364  0.8141  0.7215  0.7310
4       0.9091  0.0  0.9091  0.9318  0.9091  0.8642  0.8750
5       0.7273  0.0  0.7273  0.7182  0.7152  0.5823  0.5899
6       0.6364  0.0  0.6364  0.8442  0.6485  0.4762  0.5590
7       0.8182  0.0  0.8182  0.8182  0.8182  0.7250  0.7250
8       0.7273  0.0  0.7273  0.8442  0.6826  0.5875  0.6674
9       0.6364  0.0  0.6364  0.6364  0.6364  0.4500  0.4500
Mean    0.7652  0.0  0.7652  0.8099  0.7549  0.6479  0.6727
Std     0.0844  0.0  0.0844  0.0756  0.0896  0.1240  0.1120
       

## Finalize model and evaluate on held-out external test set

**Final evaluation on external test set**
- Report macro F1 and Accuracy from the test predictions.
- `plot_model(final_model, plot="confusion_matrix", save=True)` gives a confusion matrix figure for the report.


In [11]:
final_model = finalize_model(candidate)

test_pred = predict_model(final_model, data=test_df.copy())
test_acc = accuracy_score(test_pred["Type"], test_pred["prediction_label"])
test_f1_macro = f1_score(test_pred["Type"], test_pred["prediction_label"], average="macro")
print("Test Accuracy:", round(float(test_acc), 5))
print("Test Macro F1:", round(float(test_f1_macro), 5))


_ = plot_model(final_model, plot="confusion_matrix")
evaluate_model(final_model)  # interactive UI


                             Model  Accuracy     AUC  Recall   Prec.      F1  \
0  Quadratic Discriminant Analysis     0.775  0.8745   0.775  0.8088  0.7638   

    Kappa     MCC  
0  0.6623  0.6859  
Test Accuracy: 0.775
Test Macro F1: 0.75952


interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

- plot_model(..., plot="confusion_matrix") shows strong diagonal dominance, indicating most samples are correctly classified.

## Save pipeline
- Saved pipeline: wheat_seeds_pipeline.pkl includes preprocessing + estimator.

- Inputs documented via inference_schema.json. Example single-row payload saved as example_request.json.

- Batch demo file saved from the test set features to avoid contamination. This will be used in Task 3 for real-time batch predictions.

In [14]:
model_name = "wheat_seeds_pipeline"
save_path = save_model(final_model, str(MODEL_SAVE_STEM))
print("Saved PyCaret pipeline to:", save_path)

pipe = get_config("pipeline")  # sklearn Pipeline
X_df = get_config("X")
y_series = get_config("y")

cfg = {
    "X_columns": list(X_df.columns),
    "X_dtypes": {c: str(t) for c, t in X_df.dtypes.items()},
    "classes": sorted(pd.Series(y_series).unique().tolist()),
    "folds": N_FOLDS,
    "random_state": RANDOM_STATE,
    "pipeline_repr": str(pipe),   # text summary of the pipeline
    # stringified variables for reference
    "variables_str": str(get_config("variables")),
}

SCHEMA_PATH.parent.mkdir(parents=True, exist_ok=True)
with open(SCHEMA_PATH, "w") as f:
    json.dump(cfg, f, indent=2)
print("Saved pycaret_setup_config.json")


example_row = test_df.drop(columns=["Type"]).iloc[[0]].to_dict(orient="records")[0]
EXAMPLE_REQ_PATH.parent.mkdir(parents=True, exist_ok=True)
with open(EXAMPLE_REQ_PATH, "w") as f:
    json.dump(example_row, f, indent=2)
with open(ARTIFACT_DIR / "inference_schema.json", "w") as f:
    json.dump({"features": list(test_df.drop(columns=["Type"]).columns)}, f, indent=2)
print("Saved example_request.json and inference_schema.json")


Transformation Pipeline and Model Successfully Saved
Saved PyCaret pipeline to: (Pipeline(memory=Memory(location=None),
         steps=[('label_encoding',
                 TransformerWrapperWithInverse(exclude=None, include=None,
                                               transformer=LabelEncoder())),
                ('numerical_imputer',
                 TransformerWrapper(exclude=None,
                                    include=['Area', 'Perimeter', 'Compactness',
                                             'Length', 'Width',
                                             'AsymmetryCoeff', 'Groove'],
                                    transformer=SimpleImputer(add_indicator=False,
                                                              copy=True,
                                                              fill_va...
                                                                                         n_jobs=None,
                                                       

## Manual MLflow logging and model registration

**MLflow logging and registry**
- The cell logs params, metrics, and artifacts.

In [15]:
# MLflow logging with new folder structure
experiment_name = "IT3385_WheatSeeds_Task2"
mlflow.set_experiment(experiment_name)

model_pkl = MODEL_SAVE_STEM.with_suffix(".pkl")  # models/wheat_seeds_pipeline.pkl
inf_schema = ARTIFACT_DIR / "inference_schema.json"  # log if you still create it

with mlflow.start_run(run_name="wheat_seeds_task2_sklearn"):
    # Params
    mlflow.log_param("random_state", RANDOM_STATE)
    mlflow.log_param("folds", N_FOLDS)
    mlflow.log_param("candidate_type", "best_tuned/blended/stacked as chosen")
    mlflow.log_param("pycaret_version", __import__("pycaret").__version__)
    mlflow.log_param("pandas_version", pd.__version__)
    mlflow.log_param("sklearn_version", __import__("sklearn").__version__)

    # Metrics
    mlflow.log_metric("holdout_macro_f1_selected", float(best_cv_f1))
    mlflow.log_metric("test_accuracy", float(test_acc))
    mlflow.log_metric("test_macro_f1", float(test_f1_macro))

    # Notebook artifacts (tables, split info, plots you saved in ARTIFACT_DIR)
    mlflow.log_artifacts(str(ARTIFACT_DIR), artifact_path="notebook_artifacts")

    # Inference assets (the files your Streamlit app needs)
    for f in [model_pkl, SCHEMA_PATH, EXAMPLE_REQ_PATH, BATCH_PATH]:
        mlflow.log_artifact(str(f), artifact_path="inference_assets")
    if inf_schema.exists():
        mlflow.log_artifact(str(inf_schema), artifact_path="inference_assets")

    # Log model to MLflow model registry area
    loaded = load_model(str(MODEL_SAVE_STEM))  # e.g., "models/wheat_seeds_pipeline"
    mlflow.sklearn.log_model(loaded, artifact_path="model")

    # Optional registry (works only if tracking server supports registry)
    try:
        run_id = mlflow.active_run().info.run_id
        model_uri = f"runs:/{run_id}/model"
        registered = mlflow.register_model(model_uri=model_uri, name="WheatSeedsClassifier_Sklearn")
        print("Registered model:", registered.name, "version:", registered.version)
    except Exception as e:
        print("MLflow registration skipped or failed:", e)


2025/08/30 07:05:09 INFO mlflow.tracking.fluent: Experiment with name 'IT3385_WheatSeeds_Task2' does not exist. Creating a new experiment.


Transformation Pipeline and Model Successfully Loaded


Successfully registered model 'WheatSeedsClassifier_Sklearn'.
Created version '1' of model 'WheatSeedsClassifier_Sklearn'.

Registered model: WheatSeedsClassifier_Sklearn version: 1





## Sanity check: batch prediction using saved pipeline

In [2]:
loaded_pipe = load_model(str(MODEL_SAVE_STEM))
batch_df = pd.read_csv(str(BATCH_PATH))
batch_preds = predict_model(loaded_pipe, data=batch_df.copy())
batch_preds.head()


Transformation Pipeline and Model Successfully Loaded


Unnamed: 0,Area,Perimeter,Compactness,Length,Width,AsymmetryCoeff,Groove,prediction_label,prediction_score
0,13.07,13.92,0.848,5.472,2.994,5.304,5.395,1,0.7186
1,12.3,13.34,0.8684,5.243,2.974,5.637,5.063,3,0.7849
2,13.37,13.78,0.8849,5.32,3.128,4.67,5.091,3,0.6722
3,15.01,14.76,0.8657,5.789,3.245,1.791,5.001,1,0.6423
4,12.11,13.47,0.8392,5.159,3.032,1.502,4.519,3,0.8452
