# Bootstrap Out-of-Bag (OOB) Error 

## What Is Out-of-Bag (OOB) Error?

When an ensemble model uses bootstrap sampling (e.g., bagging, random forest), each base learner is trained on a sample drawn **with replacement**. In each bootstrap sample, roughly **$37 \%$** of observations from the original data are left out of that bootstrap; they are the **out-of-bag** cases. Evaluating using these "test" set and averaging loss over all learners yields the **OOB error**.

> Q: How does one come up with the number $37\%$? Hint: Think about how sampling with replacement works. 

---

### How It Works 

Using tree-based methods as examples:

1. For each $b \in \{1, 2, \ldots, B\}$, 
    i. Sample $n$ rows *with replacement* and fit the tree ($n$ is the sample size).  
    ii. Mark the observations rows not drawn (the OOB set).  
    iii. Evaluate the performance of the fitted tree on the OOB set.
2. Calculate the aggregated OOB scores. 
3. If there are multiple models or hyperparameters under considerations, Repeat Steps 1 and 2 for each model or value of hyperparameter 

---

### Why It Works 

1. Independent OOB set and training set in each bootstrap iteration.
2. Large number of bootstrap iterations. 

---

### Pros & Cons

**Pros**

✅ No extra cost in data splitting for model based on bootstraps.   
✅ Use all data for training and evaluation  


**Cons**

❌ Only feasible for models based on bootstraps.   
❌ Unreliable with small $B$

> The following visualization is created using Gemini 2.5 Pro. Both ChatGPT 3o and Claude Sonnet 4 failed using the same prompt. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import ipywidgets as widgets
from IPython.display import display
from collections import Counter

# --- 1. Configuration and Data Setup ---

# List of emojis to use for the dataset symbols
EMOJI_LIST = [
    "😀", "😃", "😄", "😁", "😆", "😅", "😂", "😉", "😊", "😇",
    "😍", "😘", "😗", "😚", "😙", "😋", "😛", "😜", "😝"
]

# Constants for the visualization
N_SAMPLES = 30
GRID_ROWS = 5
GRID_COLS = 6

# For reproducibility, we use a fixed seed
np.random.seed(42)

# Create the original dataset of 30 emojis
original_data = np.random.choice(EMOJI_LIST, N_SAMPLES)
original_indices = np.arange(N_SAMPLES)

# --- 2. Visualization and Plotting ---

# ⭐ Changed: Increased figsize from (10, 5.5) to (13, 7) for more spacing
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 7))
plt.close(fig) # Prevent a static version from displaying

def draw_grid(ax, title, data, colors=None, counts=None):
    """
    Helper function to draw a grid of emojis on a matplotlib axis.
    Now includes logic to display sample counts.
    """
    ax.clear()
    ax.set_title(title, fontsize=16, pad=10)

    # Set defaults if arguments are not provided
    if colors is None:
        colors = ['#f0f0f0'] * N_SAMPLES
    if counts is None:
        counts = {}

    for i, emoji in enumerate(data):
        row = i // GRID_COLS
        col = i % GRID_COLS
        # Draw the emoji with its colored background
        ax.text(
            col,
            GRID_ROWS - 1 - row,
            emoji,
            fontsize=28,
            ha='center',
            va='center',
            bbox=dict(boxstyle='square,pad=0.5', fc=colors[i], ec='grey')
        )

        # If a count is provided for this item, display it
        if i in counts:
            ax.text(
                col + 0.45,
                GRID_ROWS - 1 - row + 0.4,
                str(counts[i]),
                fontsize=11,
                ha='right',
                va='top',
                color='black',
                fontweight='bold'
            )

    ax.set_xlim(-0.5, GRID_COLS - 0.5)
    ax.set_ylim(-0.5, GRID_ROWS - 0.5)
    ax.axis('off')

# --- 3. Interactive Widget Callbacks ---

bootstrap_button = widgets.Button(description="Bootstrap", button_style='primary')
reset_button = widgets.Button(description="Reset", button_style='info')
plot_output = widgets.Output()

def initialize_plots():
    """Draws the initial state of the visualization."""
    with plot_output:
        plot_output.clear_output(wait=True)
        draw_grid(ax1, "Original Dataset", original_data)
        draw_grid(ax2, "Bootstrap Sample", [""] * N_SAMPLES)
        display(fig)

def on_bootstrap_clicked(b):
    """Callback for the 'Bootstrap' button. Now shows counts and uses blue/red colors."""
    # Create a bootstrap sample by drawing N indices with replacement
    bootstrap_indices = np.random.choice(original_indices, size=N_SAMPLES, replace=True)
    bootstrap_sample = original_data[bootstrap_indices]

    # Get a dictionary of {index: count} for sampled items
    sample_counts = Counter(bootstrap_indices)

    # Identify the Out-of-Bag (OOB) indices
    oob_indices = np.setdiff1d(original_indices, list(sample_counts.keys()))

    # Calculate the OOB proportion
    oob_proportion = len(oob_indices) / N_SAMPLES

    # Create color list: Blue for in-bag, Red for out-of-bag
    colors = []
    for i in original_indices:
        if i in oob_indices:
            colors.append('#FF6347') # Tomato Red for OOB
        else:
            colors.append('#87CEEB') # Sky Blue for In-Bag

    # Create dynamic title with the OOB percentage
    title_ax1 = f"Original Dataset (OOB: {oob_proportion:.1%})"

    # Redraw both plots to reflect the new state
    with plot_output:
        plot_output.clear_output(wait=True)
        # Pass the counts dictionary to the draw_grid function for the first plot
        draw_grid(ax1, title_ax1, original_data, colors=colors, counts=sample_counts)
        draw_grid(ax2, "Bootstrap Sample", bootstrap_sample)
        display(fig)

def on_reset_clicked(b):
    """Callback for the 'Reset' button."""
    initialize_plots()

# --- 4. Assemble and Display the UI ---

bootstrap_button.on_click(on_bootstrap_clicked)
reset_button.on_click(on_reset_clicked)

button_box = widgets.HBox([bootstrap_button, reset_button], layout={'justify_content': 'center'})
ui = widgets.VBox([plot_output, button_box])

initialize_plots()
display(ui)

VBox(children=(Output(), HBox(children=(Button(button_style='primary', description='Bootstrap', style=ButtonSt…