# Uncertainty-Based Active Learning with AutoRA

Welcome to uncertainty-based active learning! In this tutorial, you'll learn how to use model uncertainty to intelligently select which experiments to run next.

## Learning Objectives

By the end of this tutorial, you will be able to:
- Understand how models can quantify prediction uncertainty
- Use Gaussian Process models that provide natural uncertainty estimates
- Implement uncertainty sampling with AutoRA's uncertainty experimentalist
- Compare random vs. uncertainty-based sampling strategies
- Analyze information gain and uncertainty reduction over time

## Why Uncertainty Sampling?

**Key Insight**: Not all samples are equally informative!

Consider two regions in your design space:
- **Region A**: Model predicts confidently (low uncertainty)
- **Region B**: Model is very uncertain (high uncertainty)

**Question**: Which region should we sample next?

**Answer**: Region B! We learn more from samples where we're uncertain.

This is the core principle of **uncertainty sampling**: Query the model where it's most uncertain.

## Information Theory Connection

Recall from the information theory tutorial:

**Entropy** quantifies uncertainty:
$$H(Y) = -\sum_y p(y) \log p(y)$$

**Mutual Information** quantifies information gain:
$$I(X; Y) = H(Y) - H(Y|X)$$

Uncertainty sampling aims to maximize mutual information between our observations and the underlying function!

## Setup & Imports

In [None]:
import sys, os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C, WhiteKernel
from sklearn.metrics import mean_squared_error

np.random.seed(42)

# Add project folder to path
target_folder = os.path.abspath(os.path.join(os.getcwd(), '..'))
if target_folder not in sys.path:
    sys.path.append(target_folder)

## Part 1: Recap - Information Theory & Uncertainty

Let's quickly review key concepts from the information theory tutorial.

### Entropy: Measuring Uncertainty

Entropy measures the average "surprise" or uncertainty in a distribution.

In [None]:
def entropy(probabilities):
    """Compute Shannon entropy for a probability distribution"""
    # Remove zero probabilities to avoid log(0)
    p = np.array(probabilities)
    p = p[p > 0]
    return -np.sum(p * np.log2(p))

# Example: Fair coin vs. biased coin
fair_coin = [0.5, 0.5]
biased_coin = [0.9, 0.1]
certain = [1.0, 0.0]

print("Entropy Examples:")
print(f"  Fair coin [0.5, 0.5]: {entropy(fair_coin):.3f} bits")
print(f"  Biased coin [0.9, 0.1]: {entropy(biased_coin):.3f} bits")
print(f"  Certain [1.0, 0.0]: {entropy(certain):.3f} bits")
print("\nHigher entropy = more uncertainty!")

### Mutual Information: Measuring Information Gain

Mutual information quantifies how much knowing X reduces uncertainty about Y.

$$I(X; Y) = H(Y) - H(Y|X)$$

In active learning:
- $Y$ = the true function we're trying to learn
- $X$ = our observations
- $I(X; Y)$ = information gain from observing $X$

**Goal**: Select observations that maximize mutual information!

## Part 2: Gaussian Processes - Models with Built-in Uncertainty

Gaussian Processes (GPs) are special because they provide both:
1. **Predictions**: $\mu(x)$ - mean prediction at point $x$
2. **Uncertainty**: $\sigma(x)$ - standard deviation at point $x$

Let's see this in action with a simple 1D example:

In [None]:
# Create simple 1D ground truth
def ground_truth_1d(x):
    return np.sin(3 * x) + 0.3 * np.cos(9 * x)

# Sample a few points
X_train_1d = np.array([[0.1], [0.4], [0.8]])
y_train_1d = ground_truth_1d(X_train_1d.ravel())

# Create GP model
kernel = C(1.0, (1e-3, 1e3)) * RBF(0.1, (1e-2, 1e2)) + WhiteKernel(noise_level=0.01)
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, random_state=42)
gp.fit(X_train_1d, y_train_1d)

# Make predictions with uncertainty
X_test_1d = np.linspace(0, 1, 200).reshape(-1, 1)
y_pred, y_std = gp.predict(X_test_1d, return_std=True)

# Visualize
plt.figure(figsize=(12, 5))

plt.fill_between(X_test_1d.ravel(), 
                 y_pred - 2*y_std, 
                 y_pred + 2*y_std, 
                 alpha=0.3, 
                 label='95% Confidence Interval')
plt.plot(X_test_1d, y_pred, 'b-', linewidth=2, label='GP Mean Prediction')
plt.plot(X_test_1d, ground_truth_1d(X_test_1d.ravel()), 'k--', linewidth=2, label='True Function')
plt.scatter(X_train_1d, y_train_1d, c='red', s=200, zorder=10, edgecolors='black', linewidths=2, label='Training Data')
plt.xlabel('x', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Gaussian Process: Predictions with Uncertainty', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.show()

print("Key Observations:")
print("  - Uncertainty is LOW near training points (shaded area is narrow)")
print("  - Uncertainty is HIGH far from training points (shaded area is wide)")
print("  - GP 'knows what it doesn't know'!")

### Uncertainty as a Sampling Strategy

If we could only sample ONE more point, where should it be?

Let's find the point with maximum uncertainty:

In [None]:
# Find point with maximum uncertainty
max_uncertainty_idx = np.argmax(y_std)
x_next = X_test_1d[max_uncertainty_idx]
y_next = ground_truth_1d(x_next.ravel())

print(f"Most uncertain point: x = {x_next[0]:.3f}")
print(f"Uncertainty (std): {y_std[max_uncertainty_idx]:.3f}")
print(f"\nThis is the BEST point to sample next!")

## Part 3: AutoRA Uncertainty Experimentalist

Now let's apply uncertainty sampling to our 2AFC experiment using AutoRA's built-in uncertainty experimentalist.

In [None]:
from resources.synthetic import twoafc
from autora.state import StandardState, on_state, estimator_on_state
from autora.experimentalist.random import random_sample

# Define participant parameters
n_units = 100
parameters = np.random.normal(1, 0.5, (n_units, 2))
parameters = np.where(parameters < 0, 0, parameters)

# Create experiment
experiment = twoafc(parameters, resolution=10)

# Get variable names
iv_names = [iv.name for iv in experiment.variables.independent_variables]
dv_names = [dv.name for dv in experiment.variables.dependent_variables]

print("Experiment setup complete!")
print(f"IVs: {iv_names}")
print(f"DVs: {dv_names}")

### Creating a GP-based Theorist

We'll use a Gaussian Process Regressor that can handle multiple inputs (participant_id, ratio, scatteredness).

**Important**: We need to create a wrapper that works with AutoRA's state system:

In [None]:
# Create GP model with appropriate kernel
# RBF kernel works well for smooth functions like our 2AFC experiment
kernel = C(1.0, (1e-3, 1e3)) * RBF([1.0, 1.0, 1.0], (1e-2, 1e2)) + WhiteKernel(noise_level=0.01)
gp_model = GaussianProcessRegressor(
    kernel=kernel, 
    n_restarts_optimizer=5, 
    random_state=42,
    normalize_y=True  # Normalize target values for better numerical stability
)

print("GP model created!")
print(f"Kernel: {gp_model.kernel}")

### Installing and Using AutoRA's Uncertainty Experimentalist

First, make sure you have the uncertainty experimentalist installed:

```bash
pip install -U "autora[experimentalist-uncertainty]"
```

Now let's import and set it up:

In [None]:
# Import uncertainty experimentalist
try:
    from autora.experimentalist.uncertainty import uncertainty_sample
    print("✓ Uncertainty experimentalist imported successfully!")
except ImportError as e:
    print("✗ Error importing uncertainty experimentalist.")
    print("  Please install with: pip install -U 'autora[experimentalist-uncertainty]'")
    raise e

# The uncertainty experimentalist needs:
# 1. A pool of candidate conditions to choose from
# 2. A model that supports predict(..., return_std=True) - like GP
# 3. Number of samples to select

print("\nUncertainty experimentalist ready!")

## Part 4: Comparison Experiment - Random vs. Uncertainty Sampling

Let's run a proper comparison:
- **Strategy 1**: Random sampling (baseline)
- **Strategy 2**: Uncertainty sampling (intelligent selection)

We'll run 10 cycles and track model performance over time.

### Strategy 1: Random Sampling Baseline

In [None]:
# Wrap components for state operations
experiment_runner = on_state(experiment.run, output=['experiment_data'])
experimentalist_random = on_state(random_sample, output=['conditions'])

# Create fresh GP for random strategy
kernel_random = C(1.0, (1e-3, 1e3)) * RBF([1.0, 1.0, 1.0], (1e-2, 1e2)) + WhiteKernel(noise_level=0.01)
gp_random = GaussianProcessRegressor(kernel=kernel_random, n_restarts_optimizer=5, random_state=42, normalize_y=True)

theorist_random = estimator_on_state(gp_random)

# Initialize state
state_random = StandardState(
    variables=experiment.variables,
    conditions=pd.DataFrame(columns=iv_names),
    experiment_data=pd.DataFrame(columns=iv_names + dv_names),
    models=[gp_random]
)

# Run 10 cycles
n_cycles = 10
samples_per_cycle = 5  # Sample 5 conditions per participant per cycle
mse_history_random = []

print("Running RANDOM sampling strategy...\n")

for cycle in range(n_cycles):
    # 1. Propose conditions randomly
    state_random = experimentalist_random(
        state_random, 
        num_samples=samples_per_cycle,
        random_state=42+cycle,
        sample_all=['participant_id']
    )
    
    # 2. Run experiment
    state_random = experiment_runner(state_random, added_noise=0.0, random_state=42+cycle)
    
    # 3. Train model
    state_random = theorist_random(state_random)
    
    # 4. Evaluate
    X = state_random.experiment_data[iv_names].values
    y_true = state_random.experiment_data[dv_names].values.ravel()
    y_pred = state_random.models[0].predict(X)
    mse = mean_squared_error(y_true, y_pred)
    mse_history_random.append(mse)
    
    print(f"Cycle {cycle+1:2d}/{n_cycles}: {len(state_random.experiment_data):4d} samples, MSE = {mse:.4f}")

print("\n✓ Random sampling complete!")

### Strategy 2: Uncertainty Sampling

In [None]:
from autora.experimentalist.pooler import grid_pool

# Wrap uncertainty experimentalist for state operations
experimentalist_uncertainty = on_state(uncertainty_sample, output=['conditions'])
pool_generator = on_state(grid_pool, output=['conditions'])  # Create pool of candidates

# Create fresh GP for uncertainty strategy
kernel_uncertainty = C(1.0, (1e-3, 1e3)) * RBF([1.0, 1.0, 1.0], (1e-2, 1e2)) + WhiteKernel(noise_level=0.01)
gp_uncertainty = GaussianProcessRegressor(kernel=kernel_uncertainty, n_restarts_optimizer=5, random_state=42, normalize_y=True)

theorist_uncertainty = estimator_on_state(gp_uncertainty)

# Initialize state with a few random samples (seed)
seed_conditions = random_sample(experiment.variables, num_samples=2, random_state=42, sample_all=['participant_id'])
state_uncertainty = StandardState(
    variables=experiment.variables,
    conditions=seed_conditions,
    experiment_data=pd.DataFrame(columns=iv_names + dv_names),
    models=[gp_uncertainty]
)

# Run initial experiment with seed conditions
state_uncertainty = experiment_runner(state_uncertainty, added_noise=0.0, random_state=42)
state_uncertainty = theorist_uncertainty(state_uncertainty)

# Track history
mse_history_uncertainty = []

# Evaluate initial state
X = state_uncertainty.experiment_data[iv_names].values
y_true = state_uncertainty.experiment_data[dv_names].values.ravel()
y_pred = state_uncertainty.models[0].predict(X)
mse = mean_squared_error(y_true, y_pred)
mse_history_uncertainty.append(mse)

print("Running UNCERTAINTY sampling strategy...\n")
print(f"Cycle  0/{n_cycles}: {len(state_uncertainty.experiment_data):4d} samples (seed), MSE = {mse:.4f}")

for cycle in range(1, n_cycles):
    # 1. Generate pool of candidate conditions
    pool_state = StandardState(
        variables=experiment.variables,
        conditions=pd.DataFrame(columns=iv_names),
        experiment_data=state_uncertainty.experiment_data.copy(),
        models=state_uncertainty.models
    )
    pool_state = pool_generator(pool_state, num_samples=20, sample_all=['participant_id'])
    
    # 2. Select most uncertain conditions from pool
    pool_state = experimentalist_uncertainty(pool_state, num_samples=samples_per_cycle)
    
    # 3. Update main state with selected conditions
    state_uncertainty.conditions = pool_state.conditions
    
    # 4. Run experiment
    state_uncertainty = experiment_runner(state_uncertainty, added_noise=0.0, random_state=42+cycle)
    
    # 5. Train model
    state_uncertainty = theorist_uncertainty(state_uncertainty)
    
    # 6. Evaluate
    X = state_uncertainty.experiment_data[iv_names].values
    y_true = state_uncertainty.experiment_data[dv_names].values.ravel()
    y_pred = state_uncertainty.models[0].predict(X)
    mse = mean_squared_error(y_true, y_pred)
    mse_history_uncertainty.append(mse)
    
    print(f"Cycle {cycle:2d}/{n_cycles}: {len(state_uncertainty.experiment_data):4d} samples, MSE = {mse:.4f}")

print("\n✓ Uncertainty sampling complete!")

## Part 5: Analysis - Comparing Strategies

Let's visualize and analyze the results!

### MSE Over Time

In [None]:
plt.figure(figsize=(10, 6))
plt.plot(range(1, n_cycles+1), mse_history_random, 'o-', label='Random Sampling', linewidth=2, markersize=8)
plt.plot(range(1, n_cycles+1), mse_history_uncertainty, 's-', label='Uncertainty Sampling', linewidth=2, markersize=8)
plt.xlabel('Cycle', fontsize=12)
plt.ylabel('Mean Squared Error', fontsize=12)
plt.title('Learning Efficiency: Random vs. Uncertainty Sampling', fontsize=14)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.yscale('log')  # Log scale to see differences more clearly
plt.show()

print("\nFinal Performance:")
print(f"  Random Sampling:      MSE = {mse_history_random[-1]:.4f}")
print(f"  Uncertainty Sampling: MSE = {mse_history_uncertainty[-1]:.4f}")
improvement = (mse_history_random[-1] - mse_history_uncertainty[-1]) / mse_history_random[-1] * 100
print(f"\n  Improvement: {improvement:.1f}% better with uncertainty sampling!")

### Sample Distribution - Where Did Each Strategy Sample?

Let's visualize where each strategy chose to sample for one participant:

In [None]:
# Get samples for participant 0
participant_id = 0
random_samples = state_random.experiment_data[state_random.experiment_data['participant_id'] == participant_id]
uncertainty_samples = state_uncertainty.experiment_data[state_uncertainty.experiment_data['participant_id'] == participant_id]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Random sampling
scatter1 = axes[0].scatter(
    random_samples['ratio'], 
    random_samples['scatteredness'], 
    c=range(len(random_samples)),
    cmap='viridis',
    s=100,
    alpha=0.6,
    edgecolors='black',
    linewidths=1
)
axes[0].set_xlabel('Ratio', fontsize=12)
axes[0].set_ylabel('Scatteredness', fontsize=12)
axes[0].set_title(f'Random Sampling (Participant {participant_id})', fontsize=14)
axes[0].grid(True, alpha=0.3)
axes[0].set_xlim(-0.1, 1.1)
axes[0].set_ylim(-0.1, 1.1)
plt.colorbar(scatter1, ax=axes[0], label='Sample Order')

# Uncertainty sampling
scatter2 = axes[1].scatter(
    uncertainty_samples['ratio'], 
    uncertainty_samples['scatteredness'], 
    c=range(len(uncertainty_samples)),
    cmap='viridis',
    s=100,
    alpha=0.6,
    edgecolors='black',
    linewidths=1
)
axes[1].set_xlabel('Ratio', fontsize=12)
axes[1].set_ylabel('Scatteredness', fontsize=12)
axes[1].set_title(f'Uncertainty Sampling (Participant {participant_id})', fontsize=14)
axes[1].grid(True, alpha=0.3)
axes[1].set_xlim(-0.1, 1.1)
axes[1].set_ylim(-0.1, 1.1)
plt.colorbar(scatter2, ax=axes[1], label='Sample Order')

plt.tight_layout()
plt.show()

print(f"\nSampling Coverage:")
print(f"  Random: {len(random_samples)} samples")
print(f"  Uncertainty: {len(uncertainty_samples)} samples")

### Uncertainty Reduction Over Time

Let's visualize how uncertainty changes as we collect more data:

In [None]:
# Create test grid for participant 0
ratio_range = np.linspace(0, 1, 30)
scatter_range = np.linspace(0, 1, 30)
ratio_grid, scatter_grid = np.meshgrid(ratio_range, scatter_range)
X_grid = np.c_[
    np.full(ratio_grid.size, participant_id),  # participant_id
    ratio_grid.ravel(),  # ratio
    scatter_grid.ravel()  # scatteredness
]

# Get uncertainty from both models
_, std_random = state_random.models[0].predict(X_grid, return_std=True)
_, std_uncertainty = state_uncertainty.models[0].predict(X_grid, return_std=True)

# Reshape for plotting
std_random_grid = std_random.reshape(ratio_grid.shape)
std_uncertainty_grid = std_uncertainty.reshape(ratio_grid.shape)

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Random
im1 = axes[0].contourf(ratio_grid, scatter_grid, std_random_grid, levels=20, cmap='YlOrRd')
axes[0].scatter(random_samples['ratio'], random_samples['scatteredness'], 
                c='blue', s=50, alpha=0.7, edgecolors='black', linewidths=1, label='Sampled Points')
axes[0].set_xlabel('Ratio', fontsize=12)
axes[0].set_ylabel('Scatteredness', fontsize=12)
axes[0].set_title(f'Uncertainty Map: Random Sampling', fontsize=14)
axes[0].legend()
plt.colorbar(im1, ax=axes[0], label='Prediction Std Dev')

# Uncertainty
im2 = axes[1].contourf(ratio_grid, scatter_grid, std_uncertainty_grid, levels=20, cmap='YlOrRd')
axes[1].scatter(uncertainty_samples['ratio'], uncertainty_samples['scatteredness'], 
                c='blue', s=50, alpha=0.7, edgecolors='black', linewidths=1, label='Sampled Points')
axes[1].set_xlabel('Ratio', fontsize=12)
axes[1].set_ylabel('Scatteredness', fontsize=12)
axes[1].set_title(f'Uncertainty Map: Uncertainty Sampling', fontsize=14)
axes[1].legend()
plt.colorbar(im2, ax=axes[1], label='Prediction Std Dev')

plt.tight_layout()
plt.show()

print("\nObservations:")
print("  - Darker red = higher uncertainty")
print("  - Uncertainty sampling focuses on high-uncertainty regions")
print("  - This leads to more uniform uncertainty reduction!")

## Part 6: Information-Theoretic Analysis

Let's connect back to information theory by computing mutual information over time.

We'll approximate mutual information using the uncertainty reduction:
$$I(X; Y) \approx \text{Initial Entropy} - \text{Final Entropy}$$

In [None]:
def compute_avg_uncertainty(model, X_test):
    """Compute average prediction uncertainty over test set"""
    _, std = model.predict(X_test, return_std=True)
    return np.mean(std)

def uncertainty_to_entropy(std):
    """Convert Gaussian std to differential entropy"""
    # For Gaussian: H = 0.5 * log(2 * pi * e * sigma^2)
    return 0.5 * np.log(2 * np.pi * np.e * std**2)

# Compute average uncertainty on full grid
avg_uncertainty_random = compute_avg_uncertainty(state_random.models[0], X_grid)
avg_uncertainty_uncertainty = compute_avg_uncertainty(state_uncertainty.models[0], X_grid)

# Convert to entropy
entropy_random = uncertainty_to_entropy(avg_uncertainty_random)
entropy_uncertainty = uncertainty_to_entropy(avg_uncertainty_uncertainty)

print("\nInformation-Theoretic Analysis:")
print("\nAverage Prediction Uncertainty (over design space):")
print(f"  Random Sampling:      {avg_uncertainty_random:.4f}")
print(f"  Uncertainty Sampling: {avg_uncertainty_uncertainty:.4f}")
print(f"  → Reduction: {(1 - avg_uncertainty_uncertainty/avg_uncertainty_random)*100:.1f}%")

print("\nDifferential Entropy (nats):")
print(f"  Random Sampling:      {entropy_random:.4f}")
print(f"  Uncertainty Sampling: {entropy_uncertainty:.4f}")
print(f"  → Lower entropy = less remaining uncertainty!")

# Visualize entropy reduction
plt.figure(figsize=(8, 6))
strategies = ['Random\nSampling', 'Uncertainty\nSampling']
entropies = [entropy_random, entropy_uncertainty]
colors = ['#ff7f0e', '#2ca02c']

bars = plt.bar(strategies, entropies, color=colors, alpha=0.7, edgecolor='black', linewidth=2)
plt.ylabel('Differential Entropy (nats)', fontsize=12)
plt.title('Remaining Uncertainty After 10 Cycles', fontsize=14)
plt.grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for bar, entropy in zip(bars, entropies):
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{entropy:.3f}',
             ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.show()

## Summary & Key Takeaways

You've learned:

1. ✅ **Information Theory Recap**: Entropy measures uncertainty, mutual information measures learning
2. ✅ **Gaussian Processes**: Models that provide natural uncertainty estimates via $\sigma(x)$
3. ✅ **Uncertainty Sampling**: Query where the model is most uncertain
4. ✅ **AutoRA Implementation**: Using `uncertainty_sample` experimentalist with GP models
5. ✅ **Performance Gains**: Uncertainty sampling learns faster than random sampling
6. ✅ **Information-Theoretic Analysis**: Measuring entropy reduction and information gain

### Key Insight

**Not all samples are equally valuable!**

By intelligently selecting samples where we're most uncertain, we:
- Reduce prediction error faster
- Cover the design space more efficiently
- Maximize information gain per observation

### What's Next?

In the next tutorial (**autora_advanced.ipynb**), you'll learn:
- **Model Disagreement**: Using ensembles for even smarter sampling
- **Query-by-Committee**: Selecting samples where models disagree most
- **Combining Strategies**: Hybrid approaches for robust active learning

## Exercises

1. **Different Kernels**: Try different GP kernels (Matern, RationalQuadratic). How does this affect uncertainty estimates?

2. **Sample Budget**: Run experiments with different `samples_per_cycle` (1, 3, 10). When does uncertainty sampling show the biggest advantage?

3. **Noise Levels**: Add noise to observations (`added_noise=0.1, 0.5`). How robust is uncertainty sampling to noise?

4. **Pool Size**: Change the pool size in `grid_pool(num_samples=...)`. Does a larger pool improve performance?

5. **Multi-Participant**: Analyze uncertainty patterns across different participants. Are some participants harder to model?

6. **Mutual Information**: Implement a more sophisticated MI estimator using differential entropy of the GP posterior. Track MI over cycles.

7. **Explore-Exploit**: Modify the experimentalist to balance uncertainty sampling (explore) with sampling near optimal points (exploit).

## Congratulations!

You've mastered uncertainty-based active learning! You now understand how to use model uncertainty to intelligently guide experimental design, and you can connect these methods to information-theoretic principles.

Next up: **Model Disagreement** for even more sophisticated active learning strategies!