# Chapter 10: Computer Vision with AutoGluon - Complete Implementation Notebook
## Retail Product Classification and Multimodal E-commerce Applications

This comprehensive notebook provides detailed implementations for all concepts covered in Chapter 10, focusing on **retail and e-commerce computer vision applications**. It includes complete code examples, advanced techniques, and production-ready implementations using the **Mini Fashion Product Images and Text Dataset** from Kaggle.

All code examples are tested and validated with **AutoGluon 1.5.0**, featuring the latest computer vision improvements including enhanced object detection presets with ~20% relative improvements in mean Average Precision (mAP) metrics, PDF document classification, and Open Vocabulary Object Detection.

### Contents
1. Environment Setup and Dataset Download
2. Fashion Product Image Classification
3. Multimodal Product Classification (Images + Text)
4. Model Architecture Deep Dive (CNNs vs ViTs)
5. Object Detection for Retail
6. Handling Missing Modalities in Production
7. Model Evaluation and A/B Testing
8. Production Deployment and Monitoring
9. Performance Optimization for Resource-Constrained Environments
10. Summary and Best Practices

### Dataset: Mini Fashion Product Images and Text Dataset

**Source**: [Kaggle - Mini Product Image and Text Dataset](https://www.kaggle.com/datasets/nirmalsankalana/mini-product-image-and-text-dataset)

This dataset contains:
- Fashion product images in multiple categories
- Product titles and descriptions
- Category labels for classification
- Perfect for demonstrating AutoGluon's multimodal capabilities in retail scenarios

---

## 1. Environment Setup and Data Preparation

Let's start by setting up our environment and downloading the mini fashion product images and text dataset we'll be using throughout this notebook.

**[Snippet 10-1]** - AutoGluon installation for computer vision

### Dependencies
Install all dependencies from the book's requirements file:
```bash
pip install -r requirements.txt
```
For Kaggle dataset access, you'll also need to configure your Kaggle API credentials (see below).

In [None]:
# Import core libraries for retail computer vision
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import os
import json
import zipfile
from datetime import datetime
import random
import warnings
warnings.filterwarnings('ignore')

# AutoGluon imports
from autogluon.multimodal import MultiModalPredictor
import autogluon.core as ag

# Image processing and visualization
from PIL import Image, ImageDraw, ImageFont
from matplotlib.patches import FancyBboxPatch, FancyArrowPatch

# Retail-specific utilities
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split

# Set up plotting for retail visualizations
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("Set2")  # Professional color scheme for retail
plt.rcParams['figure.figsize'] = (12, 8)
np.random.seed(42)

print(f"AutoGluon version: {ag.__version__}")
print("Retail computer vision environment ready!")

### GPU Setup and Hardware Recommendations

Computer vision tasks are computationally intensive. While AutoGluon can work on CPU-only systems, GPU acceleration significantly improves training speed.

In [None]:
# GPU setup check optimized for retail applications
import torch

def check_retail_gpu_setup():
    """Check GPU configuration for retail computer vision workloads"""
    
    print("Retail Computer Vision GPU Check:")
    print(f"   PyTorch version: {torch.__version__}")
    print(f"   CUDA available: {torch.cuda.is_available()}")
    
    if torch.cuda.is_available():
        print(f"   GPU count: {torch.cuda.device_count()}")
        gpu_memory = 0
        for i in range(torch.cuda.device_count()):
            gpu_name = torch.cuda.get_device_name(i)
            gpu_memory = torch.cuda.get_device_properties(i).total_memory / 1e9
            print(f"   GPU {i}: {gpu_name} ({gpu_memory:.1f} GB)")
            
        # Retail-specific memory recommendations
        if gpu_memory < 6:
            print("\n   Limited GPU memory: Use 'medium_quality' for product classification")
            recommended_preset = 'medium_quality'
            recommended_batch_size = 8
        elif gpu_memory >= 16:
            print("\n   High-end GPU: Perfect for 'best_quality' retail models")
            recommended_preset = 'best_quality'
            recommended_batch_size = 32
        else:
            print("\n   Good GPU memory: 'high_quality' preset recommended")
            recommended_preset = 'high_quality'
            recommended_batch_size = 16
            
        print(f"   Recommended preset: {recommended_preset}")
        print(f"   Recommended batch size: {recommended_batch_size}")
        
        # Note about production inference
        print("\n   Note: For production INFERENCE (not training), hardware requirements")
        print("   are much more flexible. Smaller GPUs or even CPUs can work well")
        print("   depending on your throughput requirements.")
        
    else:
        print("\n   CPU-only mode: Suitable for small retail catalogs and inference")
        print("   Consider GPU for training on large-scale e-commerce applications")
        recommended_preset = 'medium_quality'
        recommended_batch_size = 4
        
    return recommended_preset, recommended_batch_size

RECOMMENDED_PRESET, BATCH_SIZE = check_retail_gpu_setup()

### Kaggle API Setup
To download the Fashion Product Images dataset, you need Kaggle API credentials:
1. Create a Kaggle account at https://www.kaggle.com
2. Go to Account Settings > API > Create New Token
3. Place the downloaded `kaggle.json` in `~/.kaggle/kaggle.json`
4. Run `chmod 600 ~/.kaggle/kaggle.json`

If you don't have Kaggle credentials, the notebook will use a fallback synthetic dataset.

In [None]:
# Download the Fashion Product Dataset from Kaggle
import kagglehub
from kagglehub import KaggleDatasetAdapter

# Set the path to the file you'd like to load
file_path = "data.csv"

# Load the latest version
df = kagglehub.load_dataset(
    KaggleDatasetAdapter.PANDAS,
    "nirmalsankalana/mini-product-image-and-text-dataset",
    file_path,
)

print("Dataset loaded successfully!")
print(f"Shape: {df.shape}")
print(f"\nFirst 5 records:")
df.head()

In [None]:
# Explore and prepare the product dataset
def explore_product_dataset(df):
    """
    Explore the structure and contents of the product dataset
    """
    print("Dataset Exploration:")
    print(f"  Total products: {len(df)}")
    print(f"  Columns: {list(df.columns)}")
    print(f"\n  Data types:")
    for col, dtype in df.dtypes.items():
        print(f"    {col}: {dtype}")
    
    # Check for missing values
    print(f"\nMissing Values:")
    missing_counts = df.isnull().sum()
    for col, count in missing_counts.items():
        if count > 0:
            print(f"  {col}: {count} ({count/len(df)*100:.1f}%)")
    
    # Analyze categorical columns
    categorical_cols = df.select_dtypes(include=['object']).columns
    for col in categorical_cols:
        if col not in ['image_path', 'title', 'description', 'image']:  # Skip text columns
            unique_count = df[col].nunique()
            print(f"\n{col}: {unique_count} unique values")
            if unique_count <= 20:  # Show distribution for categorical variables
                print(df[col].value_counts().head(10))
    
    return df

# Explore the dataset
if 'df' in locals() and df is not None:
    explore_product_dataset(df)
else:
    print("Please run the dataset download section first.")

In [None]:
# Prepare dataset for AutoGluon
def prepare_fashion_data(df):
    """Prepare fashion dataset for AutoGluon computer vision"""
    
    if df is None:
        print("No dataset to prepare")
        return None
    
    print("Preparing dataset for AutoGluon...")
    
    # Make a copy to avoid modifying original
    prepared_df = df.copy()
    
    # Identify image and label columns (adjust based on actual dataset structure)
    print("\nAnalyzing dataset structure...")
    
    # Common column name patterns for images
    image_col_patterns = ['image', 'img', 'filename', 'file', 'path']
    image_col = None
    for col in prepared_df.columns:
        if any(pattern in col.lower() for pattern in image_col_patterns):
            image_col = col
            break
    
    # Common column name patterns for labels/categories
    label_col_patterns = ['category', 'class', 'label', 'type']
    label_col = None
    for col in prepared_df.columns:
        if any(pattern in col.lower() for pattern in label_col_patterns):
            label_col = col
            break
    
    print(f"Image column: {image_col}")
    print(f"Label column: {label_col}")
    
    if not image_col or not label_col:
        print("\nCould not automatically identify image and label columns.")
        print("Available columns:", list(prepared_df.columns))
        return None
    
    # Standardize column names for AutoGluon
    if image_col != 'image':
        prepared_df = prepared_df.rename(columns={image_col: 'image'})
    if label_col != 'label':
        prepared_df = prepared_df.rename(columns={label_col: 'label'})
    
    # Display label distribution
    if 'label' in prepared_df.columns:
        print("\nLabel Distribution:")
        print(prepared_df['label'].value_counts())
    
    print(f"\nPrepared dataset with {len(prepared_df)} products")
    return prepared_df

# Prepare the dataset
if 'df' in locals() and df is not None:
    prepared_fashion_df = prepare_fashion_data(df)
else:
    print("Dataset not ready for preparation")
    prepared_fashion_df = None

In [None]:
# Figure 10-1: AutoGluon Computer Vision Workflow Diagram
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.patches import FancyBboxPatch

def create_workflow_diagram():
    """Create Figure 10-1: AutoGluon CV Workflow"""
    
    fig, ax = plt.subplots(1, 1, figsize=(14, 4))
    ax.set_xlim(0, 14)
    ax.set_ylim(0, 4)
    ax.axis('off')
    
    # Define workflow stages
    boxes = [
        {'x': 0.5, 'label': 'Raw\nImages', 'color': '#E8F4FD'},
        {'x': 3.0, 'label': 'Data\nPreprocessing', 'color': '#FFF3E0'},
        {'x': 5.5, 'label': 'Model\nTraining', 'color': '#E8F5E9'},
        {'x': 8.0, 'label': 'Evaluation', 'color': '#F3E5F5'},
        {'x': 10.5, 'label': 'Deployment', 'color': '#FFEBEE'},
    ]
    
    # Draw boxes and arrows
    for i, box in enumerate(boxes):
        rect = FancyBboxPatch((box['x'], 1.2), 2, 1.6, 
                              boxstyle="round,pad=0.05,rounding_size=0.2",
                              facecolor=box['color'], edgecolor='#333333', linewidth=2)
        ax.add_patch(rect)
        ax.text(box['x'] + 1, 2, box['label'], ha='center', va='center', 
                fontsize=12, fontweight='bold', color='#333333')
        
        if i < len(boxes) - 1:
            ax.annotate('', xy=(boxes[i+1]['x'] - 0.1, 2), xytext=(box['x'] + 2.1, 2),
                       arrowprops=dict(arrowstyle='->', color='#666666', lw=2))
    
    ax.text(7, 3.6, 'Figure 10-1: AutoGluon Computer Vision Workflow', 
            ha='center', va='center', fontsize=16, fontweight='bold', color='#1a1a1a')
    
    plt.tight_layout()
    plt.savefig('figure_10_1_workflow.png', dpi=150, bbox_inches='tight', 
                facecolor='white', edgecolor='none')
    plt.show()
    print("Figure 10-1 saved as 'figure_10_1_workflow.png'")

create_workflow_diagram()

## 2. Image Classification with MultiModalPredictor

Image classification is the foundation of most computer vision applications. We'll build a practical system for classifying different types of products.

In [None]:
# Train your first fashion product classifier 
def train_fashion_classifier(df, preset='medium_quality', time_limit=1800):
    """Train a fashion product image classifier using AutoGluon 1.5.0"""
    
    if df is None or len(df) == 0:
        print("No data available for training")
        return None
    
    print(f"Training fashion product classifier with AutoGluon 1.5.0")
    print(f"Dataset size: {len(df)} products")
    print(f"Preset: {preset}")
    print(f"Time limit: {time_limit // 60} minutes")
    
    # Ensure we have the required columns
    if 'image' not in df.columns or 'label' not in df.columns:
        print("Dataset must have 'image' and 'label' columns")
        return None
    
    # Check class distribution and filter out classes with too few samples
    print("\nAnalyzing class distribution...")
    class_counts = df['label'].value_counts()
    print("Class distribution:")
    for label, count in class_counts.items():
        print(f"   {label}: {count} samples")
    
    # Filter out classes with fewer than 2 samples for stratified splitting
    min_samples_per_class = 2
    valid_classes = class_counts[class_counts >= min_samples_per_class].index
    
    if len(valid_classes) < len(class_counts):
        print(f"\nFiltering out classes with < {min_samples_per_class} samples")
        df = df[df['label'].isin(valid_classes)].copy()
    
    # Split data with stratification
    # Note: For highly imbalanced categories, consider using class weights or oversampling
    print(f"\nSplitting data with stratification...")
    
    try:
        train_df, temp_df = train_test_split(
            df, test_size=0.2, random_state=42,
            stratify=df['label']
        )
        val_df, test_df = train_test_split(
            temp_df, test_size=0.5, random_state=42,
            stratify=temp_df['label'] if len(temp_df) >= len(temp_df['label'].unique()) * 2 else None
        )
    except ValueError as e:
        print(f"Stratified split failed, using random split: {e}")
        train_df, temp_df = train_test_split(df, test_size=0.2, random_state=42)
        val_df, test_df = train_test_split(temp_df, test_size=0.5, random_state=42)
    
    print(f"   Train: {len(train_df)} | Val: {len(val_df)} | Test: {len(test_df)}")
    
    # Initialize the predictor
    print(f"\nInitializing MultiModalPredictor...")
    
    predictor = MultiModalPredictor(
        label='label',
        path='./fashion_classifier',
        eval_metric='accuracy',
        problem_type='classification'
    )
    
    # Train the model
    print("\nStarting training...")
    print("AutoGluon automatically handles:")
    print("   - Image preprocessing (resizing, normalization)")
    print("   - Data augmentation (rotation, flipping, color adjustment, etc.)")
    print("   - Model architecture selection")
    print("   - Transfer learning from pre-trained models")
    
    start_time = datetime.now()
    
    try:
        predictor.fit(
            train_df,
            tuning_data=val_df if len(val_df) >= 3 else None,
            time_limit=time_limit,
            presets=preset
        )
        
        training_time = datetime.now() - start_time
        print(f"\nTraining completed in {training_time}")
        
        # Evaluate on test set
        print("\nEvaluating on test set...")
        test_results = predictor.evaluate(test_df)
        
        print("\nTest Results:")
        for metric, value in test_results.items():
            print(f"   {metric}: {value:.4f}")
        
        return predictor, test_df, test_results
        
    except Exception as e:
        print(f"Training failed: {e}")
        return None, None, None

# Train the fashion classifier
if prepared_fashion_df is not None:
    sample_size = min(500, len(prepared_fashion_df))
    demo_df = prepared_fashion_df.sample(n=sample_size, random_state=42)
    
    print(f"Training on {sample_size} products for demonstration")
    print("For production, use the full dataset\n")
    
    predictor, test_data, results = train_fashion_classifier(
        demo_df, 
        preset=RECOMMENDED_PRESET,
        time_limit=1800  # 30 minutes
    )
else:
    print("Dataset not ready for training")
    predictor, test_data, results = None, None, None

In [None]:
# Figure 10-2: AutoGluon Automatic Preprocessing Pipeline
def create_preprocessing_diagram():
    """Create Figure 10-2: Preprocessing Pipeline with sample image transformation"""
    
    fig, ax = plt.subplots(1, 1, figsize=(14, 5))
    ax.set_xlim(0, 14)
    ax.set_ylim(0, 5)
    ax.axis('off')
    
    # Define pipeline stages
    stages = [
        {'x': 0.3, 'label': 'Input\nImage', 'color': '#BBDEFB', 'detail': '(Variable Size)'},
        {'x': 2.8, 'label': 'Resize', 'color': '#C8E6C9', 'detail': '(224×224)'},
        {'x': 5.3, 'label': 'Normalize', 'color': '#FFE0B2', 'detail': '(ImageNet Stats)'},
        {'x': 7.8, 'label': 'Augment', 'color': '#E1BEE7', 'detail': '(Random Transforms)'},
        {'x': 10.3, 'label': 'Model\nInput', 'color': '#FFCDD2', 'detail': '(Tensor)'},
    ]
    
    # Draw stages
    for i, stage in enumerate(stages):
        rect = FancyBboxPatch((stage['x'], 1.5), 2.2, 2, 
                              boxstyle="round,pad=0.05,rounding_size=0.2",
                              facecolor=stage['color'], edgecolor='#333333', linewidth=2)
        ax.add_patch(rect)
        
        ax.text(stage['x'] + 1.1, 2.7, stage['label'], ha='center', va='center', 
                fontsize=11, fontweight='bold', color='#333333')
        ax.text(stage['x'] + 1.1, 1.9, stage['detail'], ha='center', va='center', 
                fontsize=9, color='#666666')
        
        if i < len(stages) - 1:
            ax.annotate('', xy=(stages[i+1]['x'] - 0.1, 2.5), xytext=(stage['x'] + 2.3, 2.5),
                       arrowprops=dict(arrowstyle='->', color='#666666', lw=2))
    
    ax.text(7, 4.5, 'Figure 10-2: AutoGluon Automatic Preprocessing Pipeline', 
            ha='center', va='center', fontsize=16, fontweight='bold', color='#1a1a1a')
    
    # Add augmentation examples below
    aug_text = "Augmentations: Rotation, Flipping, Color Adjustment, Cropping, Scaling, Perspective"
    ax.text(7, 0.7, aug_text, ha='center', va='center', fontsize=10, 
            style='italic', color='#555555')
    
    plt.tight_layout()
    plt.savefig('figure_10_2_preprocessing.png', dpi=150, bbox_inches='tight', 
                facecolor='white', edgecolor='none')
    plt.show()
    print("Figure 10-2 saved as 'figure_10_2_preprocessing.png'")

create_preprocessing_diagram()

In [None]:
# Figure 10-5: Sample Product Images from the Dataset
import kagglehub
import os
from PIL import Image

def create_product_category_grid(df, n_categories=6):
    """
    Create Figure 10-5: Display sample product images from different categories
    Loads images from the local kagglehub cache directory
    """
    
    if df is None or len(df) == 0:
        print("No dataset available.")
        return
    
    # Get the dataset path from kagglehub
    dataset_path = kagglehub.dataset_download("nirmalsankalana/mini-product-image-and-text-dataset")
    images_dir = os.path.join(dataset_path, "data")
    
    print(f"Dataset path: {dataset_path}")
    print(f"Images directory: {images_dir}")
    print(f"Dataset columns: {list(df.columns)}")
    
    # Identify columns - the dataset has 'image' for filename and 'category' for labels
    image_col = 'image'
    label_col = 'category'
    
    print(f"Using image column: {image_col}")
    print(f"Using label column: {label_col}")
    
    # Get top categories
    categories = df[label_col].value_counts().head(n_categories).index.tolist()
    print(f"\nTop {n_categories} categories: {categories}")
    
    # Create figure
    n_cols = 3
    n_rows = 2
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(12, 8))
    axes = axes.flatten()
    
    # Color palette for borders and titles
    colors = ['#1976D2', '#7B1FA2', '#388E3C', '#F57C00', '#5D4037', '#D32F2F']
    
    for idx, category in enumerate(categories[:6]):
        ax = axes[idx]
        
        # Get a sample image for this category
        category_df = df[df[label_col] == category]
        image_filename = category_df[image_col].iloc[0]
        image_path = os.path.join(images_dir, image_filename)
        
        print(f"Loading {category}: {image_filename}")
        
        try:
            # Load the image from local file
            img = Image.open(image_path).convert('RGB')
            ax.imshow(img)
            print(f"  Successfully loaded: {image_path}")
            
        except Exception as e:
            print(f"  Failed to load: {e}")
            # Show placeholder with category name
            ax.set_facecolor(colors[idx % len(colors)] + '20')
            ax.text(0.5, 0.5, f'{category}\n(Image unavailable)', 
                   ha='center', va='center', fontsize=12, fontweight='bold',
                   color=colors[idx % len(colors)], transform=ax.transAxes)
        
        ax.set_title(category, fontsize=12, fontweight='bold', 
                    color=colors[idx % len(colors)], pad=10)
        ax.set_xticks([])
        ax.set_yticks([])
        for spine in ax.spines.values():
            spine.set_edgecolor(colors[idx % len(colors)])
            spine.set_linewidth(2)
    
    # Hide unused subplots
    for ax in axes[len(categories):]:
        ax.axis('off')
    
    fig.suptitle('Figure 10-5: Sample Product Images by Category', 
                 fontsize=16, fontweight='bold', y=1.02)
    
    plt.tight_layout()
    plt.savefig('figure_10_5_categories.png', dpi=150, bbox_inches='tight', 
                facecolor='white', edgecolor='none')
    plt.show()
    print("\nFigure 10-5 saved as 'figure_10_5_categories.png'")

# Create Figure 10-5 using the dataset
if 'df' in locals() and df is not None:
    create_product_category_grid(df)
else:
    print("Dataset not loaded. Please run the data loading cells first.")


In [None]:
# Figure 10-7: Data Augmentation Techniques Visualization
import kagglehub
import os
import numpy as np
from PIL import Image, ImageEnhance, ImageFilter

def create_augmentation_visualization(df=None):
    """
    Create Figure 10-7: Demonstrate data augmentation techniques
    Uses actual dataset image from local kagglehub cache
    """
    
    fig, axes = plt.subplots(2, 4, figsize=(16, 8))
    axes = axes.flatten()
    
    augmentations = [
        {'name': 'Original', 'transform': None},
        {'name': 'Horizontal Flip', 'transform': 'flip'},
        {'name': 'Rotation (15deg)', 'transform': 'rotate'},
        {'name': 'Brightness +20%', 'transform': 'bright'},
        {'name': 'Contrast +30%', 'transform': 'contrast'},
        {'name': 'Random Crop', 'transform': 'crop'},
        {'name': 'Color Jitter', 'transform': 'color'},
        {'name': 'Gaussian Blur', 'transform': 'blur'},
    ]
    
    # Try to load a real image from the dataset
    sample_image = None
    if df is not None:
        try:
            # Get the dataset path from kagglehub
            dataset_path = kagglehub.dataset_download("nirmalsankalana/mini-product-image-and-text-dataset")
            images_dir = os.path.join(dataset_path, "data")
            
            # Get the first image filename
            image_filename = df['image'].iloc[0]
            image_path = os.path.join(images_dir, image_filename)
            
            print(f"Loading image: {image_path}")
            sample_image = Image.open(image_path).convert('RGB')
            sample_image = sample_image.resize((224, 224))
            print("Successfully loaded dataset image!")
            
        except Exception as e:
            print(f"Could not load dataset image: {e}")
    
    # Create a sample product image if none available
    if sample_image is None:
        print("Creating synthetic sample image...")
        # Create a colorful product-like image
        img_array = np.zeros((224, 224, 3), dtype=np.uint8)
        # Add a gradient background
        for i in range(224):
            for j in range(224):
                img_array[i, j] = [200 + i//8, 180 - j//8, 150]
        # Add a simple shape to represent a product
        img_array[50:174, 62:162] = [60, 60, 120]  # Blue rectangle
        img_array[70:154, 82:142] = [200, 200, 220]  # Inner highlight
        sample_image = Image.fromarray(img_array)
    
    # Apply augmentations and display
    for idx, aug in enumerate(augmentations):
        ax = axes[idx]
        
        # Create augmented version
        if aug['transform'] is None:
            aug_img = sample_image.copy()
        elif aug['transform'] == 'flip':
            aug_img = sample_image.transpose(Image.FLIP_LEFT_RIGHT)
        elif aug['transform'] == 'rotate':
            aug_img = sample_image.rotate(15, fillcolor=(255, 255, 255))
        elif aug['transform'] == 'bright':
            enhancer = ImageEnhance.Brightness(sample_image)
            aug_img = enhancer.enhance(1.2)
        elif aug['transform'] == 'contrast':
            enhancer = ImageEnhance.Contrast(sample_image)
            aug_img = enhancer.enhance(1.3)
        elif aug['transform'] == 'crop':
            # Random crop and resize
            width, height = sample_image.size
            left = width // 10
            top = height // 10
            right = width - width // 10
            bottom = height - height // 10
            aug_img = sample_image.crop((left, top, right, bottom)).resize((224, 224))
        elif aug['transform'] == 'color':
            enhancer = ImageEnhance.Color(sample_image)
            aug_img = enhancer.enhance(1.5)
        elif aug['transform'] == 'blur':
            aug_img = sample_image.filter(ImageFilter.GaussianBlur(radius=2))
        else:
            aug_img = sample_image.copy()
        
        ax.imshow(aug_img)
        ax.set_title(aug['name'], fontsize=11, fontweight='bold', pad=8)
        ax.set_xticks([])
        ax.set_yticks([])
        
        # Color the border based on whether it's original or augmented
        border_color = '#2E7D32' if aug['transform'] is None else '#1565C0'
        for spine in ax.spines.values():
            spine.set_edgecolor(border_color)
            spine.set_linewidth(2)
    
    fig.suptitle('Figure 10-7: Common Data Augmentation Techniques for Retail Images', 
                 fontsize=14, fontweight='bold', y=1.02)
    
    plt.tight_layout()
    plt.savefig('figure_10_7_augmentation.png', dpi=150, bbox_inches='tight', 
                facecolor='white', edgecolor='none')
    plt.show()
    print("\nFigure 10-7 saved as 'figure_10_7_augmentation.png'")

# Create Figure 10-7
if 'df' in locals() and df is not None:
    create_augmentation_visualization(df)
else:
    create_augmentation_visualization(None)


## 3. Understanding Model Architectures: CNNs vs Vision Transformers

AutoGluon automatically selects between Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) based on your dataset. Here's what you need to know:

**CNNs (ResNet, EfficientNet):**
- Excel at capturing spatial patterns and hierarchical features
- Work well with smaller datasets
- Faster inference

**Vision Transformers (ViTs):**
- Typically need MORE data than CNNs to perform well
- For small datasets (<5,000 images), CNNs like EfficientNet often outperform ViTs
- AutoGluon handles this trade-off automatically based on your dataset size

**Transfer Learning Note:**
- ImageNet pre-training works best for natural images
- For specialized domains (medical imaging, satellite imagery, industrial inspection), domain-specific pre-trained models often perform better
- AutoGluon provides access to specialized models through TIMM and HuggingFace model hubs

In [None]:
# Figure 10-6: CNN vs Vision Transformer Architecture Comparison
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.patches import Rectangle, FancyBboxPatch
import numpy as np

def create_cnn_vs_vit_diagram():
    """
    Create Figure 10-6: Compare CNN and Vision Transformer architectures
    """
    fig, axes = plt.subplots(1, 2, figsize=(16, 8))
    
    # CNN Architecture (left)
    ax1 = axes[0]
    ax1.set_xlim(0, 10)
    ax1.set_ylim(0, 12)
    ax1.set_aspect('equal')
    ax1.axis('off')
    ax1.set_title('Convolutional Neural Network (CNN)', fontsize=14, fontweight='bold', color='#1565C0', pad=20)
    
    # Input image
    rect = FancyBboxPatch((1, 9), 2, 2, boxstyle="round,pad=0.05", 
                                    facecolor='#E3F2FD', edgecolor='#1565C0', linewidth=2)
    ax1.add_patch(rect)
    ax1.text(2, 10, 'Input\nImage', ha='center', va='center', fontsize=10, fontweight='bold')
    
    # Conv layers
    colors = ['#BBDEFB', '#90CAF9', '#64B5F6', '#42A5F5']
    for i, (y, c) in enumerate(zip([6.5, 5, 3.5, 2], colors)):
        rect = FancyBboxPatch((0.5 + i*0.3, y), 3 - i*0.3, 1.2, boxstyle="round,pad=0.05",
                                        facecolor=c, edgecolor='#1565C0', linewidth=2)
        ax1.add_patch(rect)
    
    ax1.text(2, 4.5, 'Convolutional\nLayers', ha='center', va='center', fontsize=9, fontweight='bold')
    
    # Arrow
    ax1.annotate('', xy=(2, 6.3), xytext=(2, 8.8), 
                 arrowprops=dict(arrowstyle='->', color='#1565C0', lw=2))
    
    # Fully connected
    rect = FancyBboxPatch((1, 0.5), 2, 1, boxstyle="round,pad=0.05",
                                    facecolor='#1565C0', edgecolor='#0D47A1', linewidth=2)
    ax1.add_patch(rect)
    ax1.text(2, 1, 'Classification', ha='center', va='center', fontsize=10, fontweight='bold', color='white')
    
    ax1.annotate('', xy=(2, 0.7), xytext=(2, 1.8),
                 arrowprops=dict(arrowstyle='->', color='#1565C0', lw=2))
    
    # Features text
    ax1.text(6, 10, 'Key Characteristics:', fontsize=11, fontweight='bold')
    ax1.text(6, 9.2, '• Local feature detection', fontsize=10)
    ax1.text(6, 8.4, '• Hierarchical learning', fontsize=10)
    ax1.text(6, 7.6, '• Translation invariance', fontsize=10)
    ax1.text(6, 6.8, '• Works well with less data', fontsize=10)
    ax1.text(6, 6.0, '• Efficient for images', fontsize=10)
    
    # ViT Architecture (right)
    ax2 = axes[1]
    ax2.set_xlim(0, 10)
    ax2.set_ylim(0, 12)
    ax2.set_aspect('equal')
    ax2.axis('off')
    ax2.set_title('Vision Transformer (ViT)', fontsize=14, fontweight='bold', color='#7B1FA2', pad=20)
    
    # Input image with patches
    rect = FancyBboxPatch((1, 9), 2, 2, boxstyle="round,pad=0.05",
                                    facecolor='#F3E5F5', edgecolor='#7B1FA2', linewidth=2)
    ax2.add_patch(rect)
    # Draw grid lines for patches
    for i in range(3):
        ax2.plot([1 + i*0.67, 1 + i*0.67], [9, 11], color='#7B1FA2', lw=1, alpha=0.5)
        ax2.plot([1, 3], [9 + i*0.67, 9 + i*0.67], color='#7B1FA2', lw=1, alpha=0.5)
    ax2.text(2, 10, 'Image\nPatches', ha='center', va='center', fontsize=10, fontweight='bold')
    
    # Patch embedding
    rect = FancyBboxPatch((0.5, 6.5), 3, 1.5, boxstyle="round,pad=0.05",
                                    facecolor='#E1BEE7', edgecolor='#7B1FA2', linewidth=2)
    ax2.add_patch(rect)
    ax2.text(2, 7.25, 'Patch Embeddings\n+ Position', ha='center', va='center', fontsize=9, fontweight='bold')
    
    ax2.annotate('', xy=(2, 6.3), xytext=(2, 8.8),
                 arrowprops=dict(arrowstyle='->', color='#7B1FA2', lw=2))
    
    # Transformer blocks
    rect = FancyBboxPatch((0.5, 3), 3, 3, boxstyle="round,pad=0.05",
                                    facecolor='#CE93D8', edgecolor='#7B1FA2', linewidth=2)
    ax2.add_patch(rect)
    ax2.text(2, 4.5, 'Transformer\nEncoder\n(Self-Attention)', ha='center', va='center', fontsize=9, fontweight='bold')
    
    ax2.annotate('', xy=(2, 3.2), xytext=(2, 6.3),
                 arrowprops=dict(arrowstyle='->', color='#7B1FA2', lw=2))
    
    # Classification
    rect = FancyBboxPatch((1, 0.5), 2, 1, boxstyle="round,pad=0.05",
                                    facecolor='#7B1FA2', edgecolor='#4A148C', linewidth=2)
    ax2.add_patch(rect)
    ax2.text(2, 1, 'Classification', ha='center', va='center', fontsize=10, fontweight='bold', color='white')
    
    ax2.annotate('', xy=(2, 0.7), xytext=(2, 2.8),
                 arrowprops=dict(arrowstyle='->', color='#7B1FA2', lw=2))
    
    # Features text
    ax2.text(6, 10, 'Key Characteristics:', fontsize=11, fontweight='bold')
    ax2.text(6, 9.2, '• Global attention mechanism', fontsize=10)
    ax2.text(6, 8.4, '• Captures long-range dependencies', fontsize=10)
    ax2.text(6, 7.6, '• Requires more training data', fontsize=10)
    ax2.text(6, 6.8, '• State-of-the-art performance', fontsize=10)
    ax2.text(6, 6.0, '• Flexible architecture', fontsize=10)
    
    fig.suptitle('Figure 10-6: CNN vs Vision Transformer Architecture Comparison', 
                 fontsize=16, fontweight='bold', y=0.98)
    
    plt.tight_layout()
    plt.savefig('figure_10_6_cnn_vit.png', dpi=150, bbox_inches='tight', 
                facecolor='white', edgecolor='none')
    plt.show()
    print("Figure 10-6 saved as 'figure_10_6_cnn_vit.png'")

create_cnn_vs_vit_diagram()


In [None]:
# Analyze model performance with detailed metrics
def analyze_model_performance(predictor, test_data, model_name="Model"):
    """Comprehensive analysis for MultiModalPredictor models"""
    
    if predictor is None or test_data is None:
        print(f"Cannot analyze {model_name} - missing predictor or test data")
        return None
    
    print(f"Comprehensive {model_name} Analysis")
    print("=" * 40)
    
    try:
        # Get predictions and probabilities
        predictions = predictor.predict(test_data)
        probabilities = predictor.predict_proba(test_data)
        true_labels = test_data['label'].values
        
        # Basic accuracy
        accuracy = np.mean(predictions == true_labels)
        print(f"\nBasic Metrics:")
        print(f"   Accuracy: {accuracy:.4f}")
        print(f"   Total Predictions: {len(predictions)}")
        
        # Confidence analysis
        confidence_scores = probabilities.max(axis=1).values
        
        print(f"\nConfidence Analysis:")
        print(f"   Average Confidence: {np.mean(confidence_scores):.3f}")
        print(f"   Median Confidence: {np.median(confidence_scores):.3f}")
        print(f"   Min/Max: {np.min(confidence_scores):.3f} / {np.max(confidence_scores):.3f}")
        
        # Confidence distribution
        high_conf = np.mean(confidence_scores >= 0.8) * 100
        medium_conf = np.mean((confidence_scores >= 0.6) & (confidence_scores < 0.8)) * 100
        low_conf = np.mean(confidence_scores < 0.6) * 100
        
        print(f"\nConfidence Distribution:")
        print(f"   High (>=0.8): {high_conf:.1f}%")
        print(f"   Medium (0.6-0.8): {medium_conf:.1f}%")
        print(f"   Low (<0.6): {low_conf:.1f}%")
        
        # Per-class accuracy
        unique_labels = sorted(set(true_labels))
        print(f"\nPer-Class Performance:")
        
        for label in unique_labels:
            label_mask = true_labels == label
            if np.sum(label_mask) > 0:
                label_accuracy = np.mean(predictions[label_mask] == true_labels[label_mask])
                label_count = np.sum(label_mask)
                print(f"   {label}: {label_accuracy:.3f} ({label_count} samples)")
        
        return {
            'accuracy': accuracy,
            'confidence_scores': confidence_scores,
            'predictions': predictions,
            'true_labels': true_labels
        }
        
    except Exception as e:
        print(f"Analysis failed: {e}")
        return None

# Run analysis
if predictor is not None and test_data is not None:
    image_analysis = analyze_model_performance(predictor, test_data, "Image Classification")

## 4. Object Detection with AutoGluon

Object detection extends beyond classification to identify and locate multiple objects within images.

In [None]:
# Object detection data format example
print("Object Detection COCO Format Example")
print("="*50)

# Example COCO format structure
coco_example = {
    "images": [
        {"id": 1, "file_name": "product_shelf_001.jpg", "width": 800, "height": 600}
    ],
    "annotations": [
        {
            "id": 1, 
            "image_id": 1, 
            "category_id": 1,
            "bbox": [100, 150, 200, 180],  # [x_min, y_min, width, height]
            "area": 36000, 
            "iscrowd": 0
        }
    ],
    "categories": [
        {"id": 1, "name": "soda_bottle"},
        {"id": 2, "name": "cereal_box"}
    ]
}

print("\nCOCO Format Structure:")
print(json.dumps(coco_example, indent=2))

print("\n\nAnnotation Tools for Creating COCO Data:")
print("   - LabelImg: https://github.com/heartexlabs/labelImg (free, open-source)")
print("   - CVAT: https://cvat.ai/ (web-based, free)")
print("   - Roboflow: https://roboflow.com/ (cloud-based with free tier)")

In [None]:
# Figure 10-3: Object Detection Output with Bounding Boxes
from matplotlib.patches import Rectangle

def create_object_detection_figure():
    """Create Figure 10-3: Object Detection Output visualization"""
    
    fig, ax = plt.subplots(1, 1, figsize=(10, 8))
    
    # Create a sample "retail shelf" background
    np.random.seed(42)
    background = np.ones((100, 100, 3)) * 0.9  # Light gray background
    
    # Add some texture variation to simulate a shelf
    for i in range(0, 100, 20):
        background[i:i+2, :, :] = 0.7  # Shelf lines
    
    ax.imshow(background, extent=[0, 100, 0, 100])
    
    # Simulated product detections on a retail shelf
    detections = [
        {'bbox': [5, 60, 20, 35], 'label': 'Soda Bottle', 'conf': 0.95, 'color': '#FF5722'},
        {'bbox': [28, 62, 18, 33], 'label': 'Cereal Box', 'conf': 0.91, 'color': '#4CAF50'},
        {'bbox': [50, 58, 22, 37], 'label': 'Snack Bag', 'conf': 0.88, 'color': '#2196F3'},
        {'bbox': [75, 60, 20, 35], 'label': 'Can', 'conf': 0.93, 'color': '#9C27B0'},
        {'bbox': [8, 15, 25, 30], 'label': 'Detergent', 'conf': 0.87, 'color': '#FF9800'},
        {'bbox': [45, 12, 28, 33], 'label': 'Water Bottle', 'conf': 0.92, 'color': '#00BCD4'},
    ]
    
    for det in detections:
        x, y, w, h = det['bbox']
        
        # Draw bounding box
        rect = Rectangle((x, y), w, h, linewidth=3, edgecolor=det['color'], 
                         facecolor=det['color'], alpha=0.2, linestyle='-')
        ax.add_patch(rect)
        
        # Draw border
        rect_border = Rectangle((x, y), w, h, linewidth=3, edgecolor=det['color'], 
                                facecolor='none', linestyle='-')
        ax.add_patch(rect_border)
        
        # Add label with confidence score
        label_text = f"{det['label']}: {det['conf']:.0%}"
        ax.text(x, y + h + 2, label_text, fontsize=9, fontweight='bold',
                color='white', bbox=dict(boxstyle='round,pad=0.3', 
                facecolor=det['color'], edgecolor='none', alpha=0.9))
    
    ax.set_xlim(0, 100)
    ax.set_ylim(0, 100)
    ax.set_title('Figure 10-3: Object Detection Output with Bounding Boxes\n(Retail Shelf Example)', 
                 fontsize=14, fontweight='bold', pad=15)
    ax.set_xlabel('Products detected with confidence scores', fontsize=10)
    ax.axis('off')
    
    # Add legend
    legend_text = "mAP@0.5: Lenient threshold | mAP@0.5:0.95: COCO standard (stricter)"
    ax.text(50, -5, legend_text, ha='center', fontsize=9, style='italic', color='#666666')
    
    plt.tight_layout()
    plt.savefig('figure_10_3_detection.png', dpi=150, bbox_inches='tight', 
                facecolor='white', edgecolor='none')
    plt.show()
    print("Figure 10-3 saved as 'figure_10_3_detection.png'")

create_object_detection_figure()

## 5. Multimodal Computer Vision Applications

One of AutoGluon's most powerful capabilities is seamlessly combining image data with other modalities like text descriptions and metadata.

In [None]:
# Prepare multimodal dataset (images + text)
def prepare_multimodal_data(df):
    """Prepare dataset for multimodal learning (images + text)"""
    
    if df is None:
        print("No data available")
        return None
    
    print("Preparing multimodal dataset (images + text)...")
    
    # Create a copy for multimodal preparation
    multimodal_df = df.copy()
    
    # Identify text columns
    text_columns = []
    for col in multimodal_df.columns:
        if col not in ['image', 'label'] and multimodal_df[col].dtype == 'object':
            sample_text = str(multimodal_df[col].iloc[0])
            if len(sample_text.split()) > 2:
                text_columns.append(col)
    
    print(f"Identified text columns: {text_columns}")
    
    # Combine text columns into a single description
    if text_columns:
        multimodal_df['text_description'] = multimodal_df[text_columns].apply(
            lambda row: ' '.join([str(val) for val in row.values if pd.notna(val)]),
            axis=1
        )
    else:
        multimodal_df['text_description'] = "Fashion product"
    
    # Keep only essential columns
    essential_columns = ['image', 'text_description', 'label']
    multimodal_df = multimodal_df[essential_columns].copy()
    multimodal_df['text_description'] = multimodal_df['text_description'].fillna('Fashion product')
    
    print(f"\nPrepared multimodal dataset with {len(multimodal_df)} products")
    print("\nSample text descriptions:")
    for i in range(min(3, len(multimodal_df))):
        text = multimodal_df['text_description'].iloc[i]
        print(f"   {i+1}. {text[:80]}..." if len(text) > 80 else f"   {i+1}. {text}")
    
    return multimodal_df

# Prepare multimodal data
if prepared_fashion_df is not None:
    multimodal_fashion_df = prepare_multimodal_data(prepared_fashion_df)
else:
    print("Dataset not ready for multimodal preparation")
    multimodal_fashion_df = None

In [None]:
# Train multimodal classifier (images + text) 
def train_multimodal_classifier(df, preset='high_quality', time_limit=2400):
    """Train a multimodal fashion classifier using images and text"""
    
    if df is None or len(df) == 0:
        print("No multimodal data available for training")
        return None
    
    print(f"Training multimodal fashion classifier (Images + Text)")
    print(f"Dataset size: {len(df)} products")
    print(f"Preset: {preset}")
    print(f"Time limit: {time_limit // 60} minutes")
    
    # Filter classes with sufficient samples
    class_counts = df['label'].value_counts()
    valid_classes = class_counts[class_counts >= 2].index
    df = df[df['label'].isin(valid_classes)].copy()
    
    # Split data
    try:
        train_df, temp_df = train_test_split(df, test_size=0.2, random_state=42, stratify=df['label'])
        val_df, test_df = train_test_split(temp_df, test_size=0.5, random_state=42)
    except ValueError:
        train_df, temp_df = train_test_split(df, test_size=0.2, random_state=42)
        val_df, test_df = train_test_split(temp_df, test_size=0.5, random_state=42)
    
    print(f"\n   Train: {len(train_df)} | Val: {len(val_df)} | Test: {len(test_df)}")
    
    # Initialize multimodal predictor
    print("\nInitializing MultiModalPredictor for images + text...")
    
    multimodal_predictor = MultiModalPredictor(
        label='label',
        path='./multimodal_fashion_classifier',
        eval_metric='accuracy',
        problem_type='classification'
    )
    
    # Train with both images and text
    print("\nTraining multimodal model...")
    print("AutoGluon will automatically:")
    print("   - Process images with computer vision models")
    print("   - Process text with language models")
    print("   - Apply cross-modal alignment")
    print("   - Combine features for improved predictions")
    
    start_time = datetime.now()
    
    try:
        multimodal_predictor.fit(
            train_df,
            tuning_data=val_df if len(val_df) >= 3 else None,
            time_limit=time_limit,
            presets=preset
        )
        
        training_time = datetime.now() - start_time
        print(f"\nMultimodal training completed in {training_time}")
        
        # Evaluate
        multimodal_results = multimodal_predictor.evaluate(test_df)
        
        print("\nMultimodal Test Results:")
        for metric, value in multimodal_results.items():
            print(f"   {metric}: {value:.4f}")
        
        return multimodal_predictor, test_df, multimodal_results
        
    except Exception as e:
        print(f"Multimodal training failed: {e}")
        return None, None, None

# Train multimodal classifier
if multimodal_fashion_df is not None:
    sample_size = min(500, len(multimodal_fashion_df))
    demo_multimodal_df = multimodal_fashion_df.sample(n=sample_size, random_state=42)
    
    print(f"Training multimodal model on {sample_size} products\n")
    
    multimodal_predictor, multimodal_test_data, multimodal_results = train_multimodal_classifier(
        demo_multimodal_df,
        preset='high_quality',
        time_limit=2400
    )
else:
    print("Multimodal dataset not ready")
    multimodal_predictor, multimodal_test_data, multimodal_results = None, None, None

In [None]:
# Figure 10-4: Multimodal Data Fusion Architecture
def create_multimodal_diagram():
    """Create Figure 10-4: Multimodal Learning Architecture"""
    
    fig, ax = plt.subplots(1, 1, figsize=(14, 6))
    ax.set_xlim(0, 14)
    ax.set_ylim(0, 6)
    ax.axis('off')
    
    # Input modalities (left side)
    modalities = [
        {'y': 4.5, 'label': 'Product\nImage', 'color': '#BBDEFB', 'icon': 'CNN/ViT'},
        {'y': 3.0, 'label': 'Title/\nDescription', 'color': '#C8E6C9', 'icon': 'Transformer'},
        {'y': 1.5, 'label': 'Metadata\n(Price, Brand)', 'color': '#FFE0B2', 'icon': 'Tabular'},
    ]
    
    for mod in modalities:
        rect = FancyBboxPatch((0.5, mod['y'] - 0.6), 2.5, 1.2, 
                              boxstyle="round,pad=0.05,rounding_size=0.2",
                              facecolor=mod['color'], edgecolor='#333333', linewidth=2)
        ax.add_patch(rect)
        ax.text(1.75, mod['y'] + 0.1, mod['label'], ha='center', va='center', 
                fontsize=10, fontweight='bold', color='#333333')
        ax.text(1.75, mod['y'] - 0.35, f"({mod['icon']})", ha='center', va='center', 
                fontsize=8, color='#666666')
    
    # Fusion arrows
    for mod in modalities:
        ax.annotate('', xy=(5.2, 3), xytext=(3.1, mod['y']),
                   arrowprops=dict(arrowstyle='->', color='#666666', lw=1.5,
                                 connectionstyle='arc3,rad=0'))
    
    # Combined Model (center)
    rect = FancyBboxPatch((5.2, 2), 3.5, 2, 
                          boxstyle="round,pad=0.05,rounding_size=0.3",
                          facecolor='#E1BEE7', edgecolor='#7B1FA2', linewidth=3)
    ax.add_patch(rect)
    ax.text(6.95, 3.2, 'Multimodal\nFusion Model', ha='center', va='center', 
            fontsize=12, fontweight='bold', color='#4A148C')
    ax.text(6.95, 2.4, '(Cross-modal Alignment)', ha='center', va='center', 
            fontsize=9, color='#7B1FA2')
    
    # Arrow to output
    ax.annotate('', xy=(11, 3), xytext=(8.8, 3),
               arrowprops=dict(arrowstyle='->', color='#666666', lw=2))
    
    # Output (right side)
    rect = FancyBboxPatch((11, 2), 2.5, 2, 
                          boxstyle="round,pad=0.05,rounding_size=0.2",
                          facecolor='#FFCDD2', edgecolor='#C62828', linewidth=2)
    ax.add_patch(rect)
    ax.text(12.25, 3, 'Enhanced\nProduct\nClassification', ha='center', va='center', 
            fontsize=10, fontweight='bold', color='#B71C1C')
    
    ax.text(7, 5.5, 'Figure 10-4: Multimodal Learning - Combining Visual and Textual Features', 
            ha='center', va='center', fontsize=14, fontweight='bold', color='#1a1a1a')
    
    # Add cross-modal alignment note
    ax.text(7, 0.5, 'Cross-modal alignment learns semantic connections: "red" in text ↔ red pixels in image', 
            ha='center', va='center', fontsize=9, style='italic', color='#555555')
    
    plt.tight_layout()
    plt.savefig('figure_10_4_multimodal.png', dpi=150, bbox_inches='tight', 
                facecolor='white', edgecolor='none')
    plt.show()
    print("Figure 10-4 saved as 'figure_10_4_multimodal.png'")

create_multimodal_diagram()

## 6. Handling Missing Modalities in Production

Better handling of missing modalities allows the system to work effectively even when some data types are unavailable.

In [None]:
# Demonstrate handling of missing modalities
def test_missing_modalities(predictor, sample_data):
    """Test model behavior with missing data"""
    
    if predictor is None or sample_data is None:
        print("Predictor or sample data not available")
        return
    
    print("Testing Missing Modality Handling")
    print("="*50)
    print("\nIn production, data is often incomplete. AutoGluon handles this gracefully.")
    
    # Test with complete data
    sample = sample_data.head(1).copy()
    
    try:
        # Complete data prediction
        complete_pred = predictor.predict(sample)
        print(f"\nComplete data prediction: {complete_pred.iloc[0]}")
        
        # Simulate missing text
        if 'text_description' in sample.columns:
            sample_missing_text = sample.copy()
            sample_missing_text['text_description'] = ""
            try:
                missing_text_pred = predictor.predict(sample_missing_text)
                print(f"Missing text prediction: {missing_text_pred.iloc[0]}")
            except:
                print("Missing text: Model handled gracefully (used image only)")
        
        print("\nKey Insight: AutoGluon can make predictions even with incomplete data,")
        print("which is crucial for real-world deployment where data quality varies.")
        
    except Exception as e:
        print(f"Error during testing: {e}")

# Test if multimodal predictor is available
if multimodal_predictor is not None and multimodal_test_data is not None:
    test_missing_modalities(multimodal_predictor, multimodal_test_data)

In [None]:
# Compare image-only vs. multimodal performance
def compare_model_performance(image_results, multimodal_results):
    """Compare performance between image-only and multimodal models"""
    
    if image_results is None or multimodal_results is None:
        print("Cannot compare - one or both models not trained")
        return
    
    print("Model Performance Comparison")
    print("="*50)
    
    # Create comparison table
    comparison_data = []
    
    for metric in image_results.keys():
        if metric in multimodal_results:
            image_val = image_results[metric]
            multimodal_val = multimodal_results[metric]
            improvement = ((multimodal_val - image_val) / image_val) * 100 if image_val != 0 else 0
            
            comparison_data.append({
                'Metric': metric,
                'Image Only': f"{image_val:.4f}",
                'Multimodal': f"{multimodal_val:.4f}",
                'Improvement': f"{improvement:+.2f}%"
            })
    
    comparison_df = pd.DataFrame(comparison_data)
    print(comparison_df.to_string(index=False))
    
    # Visualize comparison
    if comparison_data:
        fig, ax = plt.subplots(figsize=(10, 6))
        
        metrics = [row['Metric'] for row in comparison_data]
        image_values = [float(row['Image Only']) for row in comparison_data]
        multimodal_values = [float(row['Multimodal']) for row in comparison_data]
        
        x = np.arange(len(metrics))
        width = 0.35
        
        bars1 = ax.bar(x - width/2, image_values, width, label='Image Only', alpha=0.8, color='steelblue')
        bars2 = ax.bar(x + width/2, multimodal_values, width, label='Multimodal', alpha=0.8, color='darkorange')
        
        ax.set_xlabel('Metrics')
        ax.set_ylabel('Performance')
        ax.set_title('Image-Only vs. Multimodal Performance Comparison')
        ax.set_xticks(x)
        ax.set_xticklabels(metrics)
        ax.legend()
        ax.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
    
    # Key insights
    print("\nKey Insights:")
    avg_improvement = np.mean([float(row['Improvement'].strip('%')) for row in comparison_data])
    print(f"   Average improvement: {avg_improvement:+.2f}%")
    
    if avg_improvement > 0:
        print("   Multimodal approach shows improvements")
        print("   Combining images with text provides valuable additional context")

# Compare models
if results is not None and multimodal_results is not None:
    compare_model_performance(results, multimodal_results)

## 7. Complete E-commerce Project

Let's build a comprehensive computer vision system that demonstrates the full range of AutoGluon's capabilities.

In [None]:
# NOTE: This visualization uses illustrative sample data, not actual model results
# Figure 10-8: Model Performance Across Product Categories
def create_category_performance_chart():
    """
    Create Figure 10-8: Performance metrics chart showing accuracy, precision, 
    and recall for different product categories
    """
    
    # Sample performance data by category
    categories = ['Electronics', 'Clothing', 'Home & Garden', 'Sports', 'Books', 'Toys']
    
    # Simulated metrics (in production, these would come from model evaluation)
    np.random.seed(42)
    accuracy = [0.94, 0.91, 0.88, 0.92, 0.95, 0.89]
    precision = [0.93, 0.89, 0.86, 0.91, 0.94, 0.87]
    recall = [0.95, 0.92, 0.90, 0.93, 0.96, 0.91]
    
    # Create figure with two subplots
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # Left plot: Grouped bar chart for metrics by category
    ax1 = axes[0]
    x = np.arange(len(categories))
    width = 0.25
    
    bars1 = ax1.bar(x - width, accuracy, width, label='Accuracy', color='#2196F3', alpha=0.8)
    bars2 = ax1.bar(x, precision, width, label='Precision', color='#4CAF50', alpha=0.8)
    bars3 = ax1.bar(x + width, recall, width, label='Recall', color='#FF9800', alpha=0.8)
    
    ax1.set_xlabel('Product Category', fontsize=12, fontweight='bold')
    ax1.set_ylabel('Score', fontsize=12, fontweight='bold')
    ax1.set_title('Performance Metrics by Category', fontsize=14, fontweight='bold')
    ax1.set_xticks(x)
    ax1.set_xticklabels(categories, rotation=45, ha='right')
    ax1.legend(loc='lower right')
    ax1.set_ylim(0.7, 1.0)
    ax1.grid(True, alpha=0.3, axis='y')
    
    # Add value labels on bars
    for bars in [bars1, bars2, bars3]:
        for bar in bars:
            height = bar.get_height()
            ax1.annotate(f'{height:.2f}',
                        xy=(bar.get_x() + bar.get_width() / 2, height),
                        xytext=(0, 3), textcoords="offset points",
                        ha='center', va='bottom', fontsize=8)
    
    # Right plot: Confidence level distribution
    ax2 = axes[1]
    
    # Simulated confidence distribution data
    confidence_levels = ['High\n(>90%)', 'Medium\n(70-90%)', 'Low\n(<70%)']
    category_conf = {
        'Electronics': [85, 12, 3],
        'Clothing': [78, 18, 4],
        'Home & Garden': [72, 22, 6],
        'Sports': [80, 16, 4],
        'Books': [88, 10, 2],
        'Toys': [75, 20, 5],
    }
    
    x = np.arange(len(confidence_levels))
    width = 0.12
    colors = ['#1976D2', '#7B1FA2', '#388E3C', '#F57C00', '#5D4037', '#D32F2F']
    
    for i, (cat, values) in enumerate(category_conf.items()):
        ax2.bar(x + i * width, values, width, label=cat, color=colors[i], alpha=0.8)
    
    ax2.set_xlabel('Confidence Level', fontsize=12, fontweight='bold')
    ax2.set_ylabel('Percentage of Predictions (%)', fontsize=12, fontweight='bold')
    ax2.set_title('Prediction Confidence Distribution by Category', fontsize=14, fontweight='bold')
    ax2.set_xticks(x + width * 2.5)
    ax2.set_xticklabels(confidence_levels)
    ax2.legend(loc='upper right', fontsize=9, ncol=2)
    ax2.grid(True, alpha=0.3, axis='y')
    
    fig.suptitle('Figure 10-8: Model Performance Across Product Categories', 
                 fontsize=16, fontweight='bold', y=1.02)
    
    plt.tight_layout()
    plt.savefig('figure_10_8_category_performance.png', dpi=150, bbox_inches='tight', 
                facecolor='white', edgecolor='none')
    plt.show()
    print("Figure 10-8 saved as 'figure_10_8_category_performance.png'")

# Generate Figure 10-8
create_category_performance_chart()

## 8. Production Deployment and Best Practices

For production deployment, consider performance optimizations, monitoring, and maintenance procedures.

In [None]:
# Resource-constrained training configuration
print("Resource-Constrained Training Configuration")
print("="*50)

# Example hyperparameters for limited GPU memory
resource_constrained_config = {
    'optimization.learning_rate': 1e-4,
    'optimization.max_epochs': 20,
    'env.per_gpu_batch_size': 8,        # What your GPU can handle
    'env.batch_size': 32,                # Effective batch size you want
    'env.precision': 'bf16-mixed',       # Mixed precision for efficiency
    'optimization.gradient_accumulation_steps': 4  # 8 * 4 = 32 effective batch size
}

print("\nFor limited GPU memory, use gradient accumulation:")
print("")
print("   hyperparameters={")
for key, value in resource_constrained_config.items():
    print(f"       '{key}': {value},")
print("   }")

print("\n\nHow Gradient Accumulation Works:")
print("   - GPU can only handle batch_size=8 at a time")
print("   - With gradient_accumulation_steps=4:")
print("     - Run 4 forward passes with batch_size=8 each")
print("     - Accumulate gradients from all 4 passes")
print("     - Update weights once (effective batch_size = 8 * 4 = 32)")
print("\n   This achieves similar training dynamics to batch_size=32")
print("   without requiring 4x more GPU memory!")

In [None]:
# Production model deployment
def demonstrate_production_deployment(predictor, model_name="fashion_classifier"):
    """Demonstrate production deployment workflow"""
    
    if predictor is None:
        print("No predictor available for deployment demonstration")
        return
    
    print("Production Deployment Workflow")
    print("="*50)
    
    # Save model
    save_path = f'./production_{model_name}'
    predictor.save(save_path)
    print(f"\n1. Model saved to: {save_path}")
    
    # Demonstrate loading
    print("\n2. Loading model for inference:")
    print(f"   production_predictor = MultiModalPredictor.load('{save_path}')")
    
    # Batch prediction pattern
    print("\n3. Batch prediction for production efficiency:")
    print("   batch_predictions = predictor.predict(new_products_batch)")
    print("   confidence_scores = predictor.predict_proba(new_products_batch)")
    
    # Confidence-based routing
    print("\n4. Confidence-based routing:")
    print("   # Filter predictions by confidence threshold")
    print("   high_confidence_mask = confidence_scores.max(axis=1) > 0.85")
    print("   auto_categorized = predictions[high_confidence_mask]")
    print("   needs_human_review = predictions[~high_confidence_mask]")
    
    print("\n5. Performance metrics to monitor:")
    print("   - Prediction latency (aim for <100ms per image)")
    print("   - Confidence distribution over time")
    print("   - Category distribution shifts (data drift)")
    print("   - Human override rate for low-confidence predictions")

# Demonstrate deployment
if predictor is not None:
    demonstrate_production_deployment(predictor)

In [None]:
# NOTE: This visualization uses illustrative sample data, not actual model results
# Figure 10-9: Model Monitoring Dashboard for E-commerce CV Systems
def create_monitoring_dashboard():
    """
    Create Figure 10-9: A comprehensive monitoring dashboard showing:
    - Model accuracy over time
    - Confidence distribution
    - Category performance
    - Alert indicators
    """
    
    fig = plt.figure(figsize=(18, 12))
    
    # Create grid layout
    gs = fig.add_gridspec(3, 3, hspace=0.35, wspace=0.3)
    
    # ============================================
    # Panel 1: Model Accuracy Over Time (top-left, spans 2 columns)
    # ============================================
    ax1 = fig.add_subplot(gs[0, :2])
    
    # Generate time series data
    np.random.seed(42)
    days = pd.date_range(start='2024-01-01', periods=90, freq='D')
    base_accuracy = 0.92
    accuracy_trend = base_accuracy + np.cumsum(np.random.randn(90) * 0.002)
    accuracy_trend = np.clip(accuracy_trend, 0.85, 0.98)
    
    ax1.plot(days, accuracy_trend, 'b-', linewidth=2, label='Model Accuracy')
    ax1.axhline(y=0.90, color='orange', linestyle='--', linewidth=1.5, label='Warning Threshold')
    ax1.axhline(y=0.85, color='red', linestyle='--', linewidth=1.5, label='Critical Threshold')
    ax1.fill_between(days, accuracy_trend, 0.85, where=(accuracy_trend < 0.90), 
                     color='orange', alpha=0.3)
    ax1.fill_between(days, accuracy_trend, 0.85, where=(accuracy_trend < 0.85), 
                     color='red', alpha=0.3)
    
    ax1.set_xlabel('Date', fontsize=11)
    ax1.set_ylabel('Accuracy', fontsize=11)
    ax1.set_title('Model Accuracy Over Time', fontsize=13, fontweight='bold')
    ax1.legend(loc='lower left', fontsize=9)
    ax1.set_ylim(0.80, 1.0)
    ax1.grid(True, alpha=0.3)
    ax1.tick_params(axis='x', rotation=45)
    
    # ============================================
    # Panel 2: Alert Status (top-right)
    # ============================================
    ax2 = fig.add_subplot(gs[0, 2])
    ax2.axis('off')
    
    # Create alert status panel
    alerts = [
        {'name': 'Model Accuracy', 'status': 'OK', 'color': '#4CAF50', 'value': '92.3%'},
        {'name': 'Avg Confidence', 'status': 'OK', 'color': '#4CAF50', 'value': '87.5%'},
        {'name': 'Drift Detection', 'status': 'WARNING', 'color': '#FF9800', 'value': '+2.1%'},
        {'name': 'Latency P99', 'status': 'OK', 'color': '#4CAF50', 'value': '45ms'},
        {'name': 'Error Rate', 'status': 'OK', 'color': '#4CAF50', 'value': '0.02%'},
    ]
    
    ax2.text(0.5, 0.95, 'System Status', ha='center', va='top', 
             fontsize=14, fontweight='bold', transform=ax2.transAxes)
    
    for i, alert in enumerate(alerts):
        y_pos = 0.82 - i * 0.16
        # Status indicator circle
        circle = plt.Circle((0.08, y_pos), 0.04, color=alert['color'], transform=ax2.transAxes)
        ax2.add_patch(circle)
        # Alert name and value
        ax2.text(0.15, y_pos, alert['name'], ha='left', va='center', 
                fontsize=10, transform=ax2.transAxes)
        ax2.text(0.85, y_pos, alert['value'], ha='right', va='center', 
                fontsize=10, fontweight='bold', transform=ax2.transAxes)
        ax2.text(0.95, y_pos, alert['status'], ha='right', va='center', 
                fontsize=9, color=alert['color'], fontweight='bold', transform=ax2.transAxes)
    
    # ============================================
    # Panel 3: Confidence Distribution (middle-left)
    # ============================================
    ax3 = fig.add_subplot(gs[1, 0])
    
    # Generate confidence score distribution
    confidence_scores = np.concatenate([
        np.random.beta(8, 2, 700),   # High confidence (majority)
        np.random.beta(5, 5, 200),   # Medium confidence
        np.random.beta(2, 5, 100),   # Low confidence
    ])
    
    ax3.hist(confidence_scores, bins=30, color='#2196F3', alpha=0.7, edgecolor='white')
    ax3.axvline(x=0.9, color='green', linestyle='--', linewidth=2, label='Auto-approve (>90%)')
    ax3.axvline(x=0.7, color='orange', linestyle='--', linewidth=2, label='Review (<70%)')
    ax3.set_xlabel('Confidence Score', fontsize=11)
    ax3.set_ylabel('Count', fontsize=11)
    ax3.set_title('Prediction Confidence Distribution', fontsize=13, fontweight='bold')
    ax3.legend(loc='upper left', fontsize=9)
    ax3.grid(True, alpha=0.3, axis='y')
    
    # ============================================
    # Panel 4: Predictions by Category (middle-center)
    # ============================================
    ax4 = fig.add_subplot(gs[1, 1])
    
    categories = ['Electronics', 'Clothing', 'Home', 'Sports', 'Books', 'Toys']
    volumes = [2450, 1890, 1230, 980, 750, 620]
    colors = ['#1976D2', '#7B1FA2', '#388E3C', '#F57C00', '#5D4037', '#D32F2F']
    
    bars = ax4.barh(categories, volumes, color=colors, alpha=0.8)
    ax4.set_xlabel('Predictions (Last 24h)', fontsize=11)
    ax4.set_title('Predictions by Category', fontsize=13, fontweight='bold')
    ax4.grid(True, alpha=0.3, axis='x')
    
    # Add value labels
    for bar, vol in zip(bars, volumes):
        ax4.text(vol + 50, bar.get_y() + bar.get_height()/2, 
                f'{vol:,}', va='center', fontsize=9)
    
    # ============================================
    # Panel 5: Hourly Throughput (middle-right)
    # ============================================
    ax5 = fig.add_subplot(gs[1, 2])
    
    hours = list(range(24))
    throughput = [120, 85, 45, 30, 25, 35, 80, 250, 480, 520, 490, 450,
                  380, 420, 510, 550, 520, 480, 350, 280, 220, 180, 160, 140]
    
    ax5.fill_between(hours, throughput, alpha=0.4, color='#2196F3')
    ax5.plot(hours, throughput, 'b-', linewidth=2)
    ax5.set_xlabel('Hour of Day', fontsize=11)
    ax5.set_ylabel('Predictions/min', fontsize=11)
    ax5.set_title('Hourly Throughput', fontsize=13, fontweight='bold')
    ax5.set_xticks([0, 6, 12, 18, 23])
    ax5.grid(True, alpha=0.3)
    
    # ============================================
    # Panel 6: Category Accuracy Trends (bottom, spans all columns)
    # ============================================
    ax6 = fig.add_subplot(gs[2, :])
    
    weeks = list(range(1, 13))
    cat_accuracy = {
        'Electronics': [0.94, 0.93, 0.94, 0.95, 0.94, 0.93, 0.94, 0.95, 0.94, 0.93, 0.94, 0.95],
        'Clothing': [0.91, 0.90, 0.91, 0.92, 0.91, 0.90, 0.89, 0.90, 0.91, 0.92, 0.91, 0.90],
        'Home & Garden': [0.88, 0.87, 0.88, 0.89, 0.88, 0.87, 0.88, 0.89, 0.90, 0.89, 0.88, 0.89],
        'Sports': [0.92, 0.91, 0.92, 0.93, 0.92, 0.91, 0.92, 0.93, 0.92, 0.91, 0.92, 0.93],
    }
    
    colors = ['#1976D2', '#7B1FA2', '#388E3C', '#F57C00']
    for (cat, acc), color in zip(cat_accuracy.items(), colors):
        ax6.plot(weeks, acc, '-o', label=cat, color=color, linewidth=2, markersize=6)
    
    ax6.set_xlabel('Week', fontsize=11)
    ax6.set_ylabel('Accuracy', fontsize=11)
    ax6.set_title('Category-Specific Accuracy Trends (12 Weeks)', fontsize=13, fontweight='bold')
    ax6.legend(loc='lower right', fontsize=10, ncol=4)
    ax6.set_ylim(0.82, 1.0)
    ax6.grid(True, alpha=0.3)
    ax6.set_xticks(weeks)
    
    fig.suptitle('Figure 10-9: E-commerce Computer Vision Model Monitoring Dashboard', 
                 fontsize=18, fontweight='bold', y=1.01)
    
    plt.tight_layout()
    plt.savefig('figure_10_9_monitoring_dashboard.png', dpi=150, bbox_inches='tight', 
                facecolor='white', edgecolor='none')
    plt.show()
    print("Figure 10-9 saved as 'figure_10_9_monitoring_dashboard.png'")

# Generate Figure 10-9
create_monitoring_dashboard()

In [None]:
# Data quality checklist
print("Data Quality Checklist for Computer Vision")
print("="*50)

checklist = [
    ("Consistent lighting", "All images should have similar lighting conditions, or use augmentation to simulate variations"),
    ("Consistent resolution", "Use same input resolution or resize appropriately"),
    ("Background consistency", "Similar backgrounds, or augment with various backgrounds"),
    ("Class balance", "Check for severely imbalanced classes, use class weights if needed"),
    ("Training-Production gap", "Ensure training data represents production scenarios"),
    ("Image quality", "Remove corrupted or very low quality images"),
    ("Label accuracy", "Verify labels are correct, especially for edge cases")
]

for i, (item, description) in enumerate(checklist, 1):
    print(f"\n{i}. {item}")
    print(f"   {description}")

print("\n\nRemember: High-quality, representative training data is the most")
print("important factor for computer vision success!")

## 9. Summary and Key Takeaways

### What You've Learned:

1. **Environment Setup**: Configured AutoGluon 1.5.0 for computer vision tasks
2. **Image Classification**: Built classifiers with the famous "three lines of code"
3. **Architecture Understanding**: CNNs vs ViTs and when each excels
4. **Multimodal Learning**: Combined images with text for improved accuracy
5. **Cross-modal Alignment**: How models learn text-image connections
6. **Missing Data Handling**: Production-ready robustness
7. **Resource Optimization**: Gradient accumulation for limited GPU memory
8. **Data Quality**: Consistency and its impact on performance

### Best Practices:

- Start with AutoGluon's default configurations
- Use stratified splits and handle class imbalance
- Consider domain-specific pre-trained models for specialized applications
- Implement confidence-based routing in production
- Monitor for data drift and performance degradation

### Object Detection Metrics Note:

When comparing object detection results:
- **mAP@0.5**: IoU threshold of 0.5 (more lenient)
- **mAP@0.5:0.95**: Averaged across thresholds (stricter, COCO standard)

These can differ significantly—always check which metric you're looking at!

---

### Additional Resources:

- [AutoGluon Documentation](https://auto.gluon.ai/)
- [Computer Vision Tutorials](https://auto.gluon.ai/stable/tutorials/multimodal/)
- [Production Deployment Guide](https://auto.gluon.ai/stable/tutorials/cloud_fit_deploy/)

Happy building with AutoGluon!