# Orchard ML Quick Start: MiniCNN on BloodMNIST (CPU)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tomrussobuilds/orchard-ml/blob/main/notebooks/01_quickstart_bloodmnist_cpu.ipynb)

This notebook demonstrates the core Orchard ML training workflow using a lightweight setup that runs entirely on **CPU**:

- **Dataset**: [BloodMNIST](https://medmnist.com/) (28x28 RGB, 8 blood cell classes)
- **Model**: MiniCNN (~94K parameters)
- **Time**: ~5-10 minutes on Colab CPU

### What you'll learn
1. How Orchard ML uses YAML configs to drive the entire training pipeline
2. How to launch training with `forge.py`
3. How to explore generated artifacts (metrics, plots, reports)

## 1. Setup

Clone the repository and install dependencies.

In [None]:
import os

%cd /content
if not os.path.isdir("orchard-ml"):
    !git clone --depth 1 https://github.com/tomrussobuilds/orchard-ml.git

%cd /content/orchard-ml
!git pull --ff-only
%pip install -q -r requirements.txt

## 2. Configuration

Orchard ML is entirely **configuration-driven** — a single YAML file controls dataset, model, training, augmentation, evaluation, and export.

We write a Colab-friendly config based on `recipes/config_mini_cnn.yaml`, with reduced epochs (15) and tracking disabled (no MLflow needed in Colab).

In [None]:
%%writefile colab_bloodmnist_cpu.yaml
# BloodMNIST CPU Quick Start — Colab-optimized config

dataset:
  name: "bloodmnist"
  data_root: ./dataset
  resolution: 28
  force_rgb: true
  use_weighted_sampler: true

architecture:
  name: "mini_cnn"
  pretrained: false
  dropout: 0.3

training:
  seed: 42
  batch_size: 64
  learning_rate: 0.01
  weight_decay: 5e-4
  momentum: 0.9
  min_lr: 1e-6
  mixup_alpha: 0.2
  label_smoothing: 0.1
  epochs: 15
  patience: 10
  grad_clip: 1.0
  mixup_epochs: 0
  scheduler_type: "cosine"
  cosine_fraction: 0.8
  scheduler_patience: 5
  scheduler_factor: 0.1
  step_size: 20
  use_amp: false
  use_tta: false
  criterion_type: "cross_entropy"
  weighted_loss: false
  focal_gamma: 2.0

augmentation:
  hflip: 0.5
  rotation_angle: 15
  jitter_val: 0.3
  min_scale: 0.9
  tta_translate: 0.5
  tta_scale: 1.02
  tta_blur_sigma: 0.1

hardware:
  device: "auto"

telemetry:
  output_dir: ./outputs
  log_level: "INFO"
  log_interval: 50

evaluation:
  batch_size: 256
  n_samples: 16
  fig_dpi: 200
  cmap_confusion: Blues
  plot_style: seaborn-v0_8-muted
  grid_cols: 4
  fig_size_predictions: [12, 8]
  report_format: xlsx
  save_confusion_matrix: true
  save_predictions_grid: true

tracking:
  enabled: false

export:
  format: onnx
  opset_version: 18
  validate_export: true

## 3. Train

Launch the full pipeline: dataset download, training for 15 epochs, evaluation, and ONNX export.

The config uses `batch_size: 64` and `epochs: 15` to keep runtime under 15 minutes on CPU.

In [None]:
!python forge.py --config colab_bloodmnist_cpu.yaml

## 4. Explore Results

Orchard ML saves all artifacts to a timestamped directory under `outputs/`. Let's inspect what was generated.

In [None]:
import glob
import os

# Find the latest run directory
run_dirs = sorted(glob.glob("outputs/*/"))
latest_run = run_dirs[-1]
print(f"Latest run: {latest_run}")

# List all generated artifacts
for root, dirs, files in os.walk(latest_run):
    level = root.replace(latest_run, "").count(os.sep)
    indent = "  " * level
    print(f"{indent}{os.path.basename(root)}/")
    sub_indent = "  " * (level + 1)
    for file in sorted(files):
        size = os.path.getsize(os.path.join(root, file))
        print(f"{sub_indent}{file} ({size / 1024:.1f} KB)")

In [None]:
from IPython.display import display, Image

# Display confusion matrix
cm_files = glob.glob(f"{latest_run}/figures/confusion_matrix*.png")
if cm_files:
    print("Confusion Matrix:")
    display(Image(filename=cm_files[0], width=600))

In [None]:
# Display predictions grid
pred_files = glob.glob(
    f"{latest_run}/figures/sample_predictions*.png"
)
if pred_files:
    print("Sample Predictions:")
    display(Image(filename=pred_files[0], width=700))

In [None]:
# Show the saved config for full reproducibility
import yaml

config_files = glob.glob(f"{latest_run}/reports/config*.yaml")
if config_files:
    with open(config_files[0]) as f:
        saved_cfg = yaml.safe_load(f)
    print("Saved configuration (full reproducibility snapshot):")
    print(yaml.dump(saved_cfg, default_flow_style=False, sort_keys=False))

## 5. Next Steps

- **GPU training**: Switch to a GPU runtime and try `recipes/config_efficientnet_b0.yaml` for 224x224 resolution
- **Hyperparameter search**: See [02_galaxy10_optuna_model_search.ipynb](./02_galaxy10_optuna_model_search.ipynb) for Optuna-powered optimization with automatic model selection
- **Custom datasets**: Check the [Configuration Guide](https://github.com/tomrussobuilds/orchard-ml/blob/main/docs/guide/CONFIGURATION.md) for adding your own data
- **Full recipe catalog**: Browse all 15 pre-configured recipes in the [recipes/](https://github.com/tomrussobuilds/orchard-ml/tree/main/recipes) directory