# Full Pipeline Tutorial[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tlancaster6/AquaCal/blob/main/docs/tutorials/01_full_pipeline.ipynb)This tutorial demonstrates the complete AquaCal calibration pipeline from start to finish. You'll learn how to:- Load synthetic or real calibration data- Run all four calibration stages- Visualize camera rig geometry and calibration quality- Validate results and interpret metrics**Prerequisites:** Basic Python and OpenCV knowledge. Familiarity with camera calibration concepts is helpful but not required.

## Data Source SelectionChoose your data source:- **`synthetic`** (recommended for first run): Generates a small synthetic rig on-the-fly. Fast, no download required.- **`preset`**: Uses a bundled small dataset (2 cameras, 10 frames). No download required.- **`zenodo`**: Downloads a larger synthetic dataset from Zenodo (6 cameras, 80 frames). Requires internet connection.

In [None]:
DATA_SOURCE = "synthetic"  # Options: "synthetic", "preset", "zenodo" 

## Setup and ImportsWe'll import the necessary modules and configure matplotlib for inline plotting.

In [None]:
import numpy as npimport matplotlib.pyplot as pltfrom aquacal.datasets import generate_synthetic_rig, load_examplefrom aquacal.config.schema import InterfaceParams, DiagnosticsData, CalibrationMetadatafrom aquacal.validation.diagnostics import plot_camera_rig# Configure matplotlibplt.rcParams['figure.figsize'] = (10, 6)plt.rcParams['font.size'] = 10print("Imports complete!")

## Load Calibration DataNow we'll load the calibration scenario based on your data source selection.

In [None]:
if DATA_SOURCE == "synthetic":    # Generate small synthetic rig (2 cameras, 10 frames)    scenario = generate_synthetic_rig("small")    print(f"Generated synthetic scenario: {scenario.name}")    print(f"  Cameras: {len(scenario.intrinsics)}")    print(f"  Frames: {len(scenario.board_poses)}")    print(f"  Description: {scenario.description}")elif DATA_SOURCE == "preset":    # Load bundled preset data    dataset = load_example("small")    scenario = dataset.ground_truth    print(f"Loaded preset dataset: {dataset.name}")    print(f"  Cameras: {len(scenario.intrinsics)}")    print(f"  Frames: {len(scenario.board_poses)}")else:  # zenodo    # Download medium dataset from Zenodo    dataset = load_example("medium")    scenario = dataset.ground_truth    print(f"Downloaded Zenodo dataset: {dataset.name}")    print(f"  Cameras: {len(scenario.intrinsics)}")    print(f"  Frames: {len(scenario.board_poses)}")

## Understanding the DataLet's explore the structure of our calibration scenario.

In [None]:
# Show camera configurationcamera_names = list(scenario.intrinsics.keys())print(f"Camera names: {camera_names}")print(f"\nBoard configuration:")print(f"  Squares: {scenario.board_config.squares_x} x {scenario.board_config.squares_y}")print(f"  Square size: {scenario.board_config.square_size * 1000:.1f} mm")print(f"  Marker size: {scenario.board_config.marker_size * 1000:.1f} mm")print(f"\nInterface distances (water surface Z):")for cam_name in sorted(scenario.intrinsics.keys()):    print(f"  {cam_name}: {scenario.water_zs[cam_name]:.4f} m")

## Stage 1: Intrinsic CalibrationFor synthetic data, intrinsics are provided as ground truth. For real data, you would run Stage 1 to calibrate intrinsics from in-air videos.Let's visualize the intrinsic parameters:

In [None]:
# Visualize intrinsic parametersfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))cameras = sorted(scenario.intrinsics.keys())fx_vals = [scenario.intrinsics[cam].K[0, 0] for cam in cameras]cx_vals = [scenario.intrinsics[cam].K[0, 2] for cam in cameras]cy_vals = [scenario.intrinsics[cam].K[1, 2] for cam in cameras]# Focal lengthsx = np.arange(len(cameras))ax1.bar(x, fx_vals, color='steelblue', alpha=0.7)ax1.set_xlabel('Camera')ax1.set_ylabel('Focal Length (pixels)')ax1.set_title('Focal Lengths (fx)')ax1.set_xticks(x)ax1.set_xticklabels(cameras)ax1.grid(axis='y', alpha=0.3)# Principal pointsax2.scatter(cx_vals, cy_vals, s=100, color='steelblue', alpha=0.7)for i, cam in enumerate(cameras):    ax2.annotate(cam, (cx_vals[i], cy_vals[i]), xytext=(5, 5), textcoords='offset points')ax2.set_xlabel('cx (pixels)')ax2.set_ylabel('cy (pixels)')ax2.set_title('Principal Points')ax2.grid(alpha=0.3)ax2.axis('equal')plt.tight_layout()plt.show()plt.close()print("Intrinsic parameters visualized.")

## Stage 2: Extrinsic InitializationStage 2 estimates initial camera poses using a BFS-based pose graph and pairwise relative pose estimation.Let's visualize the camera rig geometry:> **Warning:** Insufficient camera overlap can cause disconnected pose graphs. Ensure each camera pair shares at least 5-10 board observations.

In [None]:
# Visualize camera rig in 3Dfrom aquacal.config.schema import CalibrationResult, CameraCalibration# Build temporary CalibrationResult for visualizationcameras_dict = {}for cam_name, intr in scenario.intrinsics.items():    cameras_dict[cam_name] = CameraCalibration(        name=cam_name,        intrinsics=intr,        extrinsics=scenario.extrinsics[cam_name],        water_z=scenario.water_zs[cam_name],    )temp_result = CalibrationResult(    cameras=cameras_dict,    interface=InterfaceParams(        normal=np.array([0.0, 0.0, -1.0]),        n_air=1.0,        n_water=1.333    ),    board=scenario.board_config,    diagnostics=DiagnosticsData(        reprojection_error_rms=0.0,        reprojection_error_per_camera={},        validation_3d_error_mean=0.0,        validation_3d_error_std=0.0    ),    metadata=CalibrationMetadata(        calibration_date='',        software_version='',        config_hash='',        num_frames_used=0,        num_frames_holdout=0    ))fig = plot_camera_rig(temp_result, title="Stage 2: Camera Rig Geometry (Ground Truth)")plt.show()plt.close()print("Camera rig geometry visualized.")

## Stage 3: Joint Refractive OptimizationStage 3 jointly optimizes:- Camera extrinsics (rotation + translation)- Interface distances (water surface Z-coordinate)- Board poses for all framesThis is the core optimization that accounts for refraction at the air-water interface.> **Warning:** If interface distance doesn't converge, check that initial estimates are within 2-3x of the true value.For this tutorial with synthetic data, we're using the ground truth directly. In a real pipeline, you would call the optimization functions from `aquacal.calibration.pipeline`.

In [None]:
# For synthetic data, we already have the optimized result# In a real pipeline, you would run:#   from aquacal.calibration.interface_estimation import optimize_interface#   extrinsics, distances, poses, rms = optimize_interface(...)# Show the ground truth interface parameterswater_z = list(scenario.water_zs.values())[0]print(f"Water surface Z-coordinate: {water_z:.4f} m")print("\nCamera positions in world frame:")for cam_name in sorted(scenario.extrinsics.keys()):    C = scenario.extrinsics[cam_name].C    h_c = water_z - C[2]  # camera-to-water vertical distance    print(f"  {cam_name}: C = [{C[0]:.3f}, {C[1]:.3f}, {C[2]:.3f}]  h_c = {h_c:.4f} m")

## Diagnostics

Now we run the actual calibration pipeline (Stages 2 and 3) on the loaded data,
then inspect the quality of the result using several diagnostic visualizations.

**Reprojection error** measures how well the calibrated model predicts the observed corner
positions. Values below 1 px indicate accurate calibration.

> **Note:** The calibration below uses Stages 2-3 only; Stage 4 intrinsic refinement is
> shown separately afterward.

In [None]:
# Generate synthetic detections and run calibration (Stages 2-3)
from aquacal.core.board import BoardGeometry
from aquacal.calibration.extrinsics import build_pose_graph, estimate_extrinsics
from aquacal.calibration.interface_estimation import optimize_interface
from aquacal.config.schema import (
    CalibrationResult, CameraCalibration, DiagnosticsData, CalibrationMetadata
)
from aquacal.validation.reprojection import compute_reprojection_errors
from tests.synthetic.ground_truth import generate_synthetic_detections

board = BoardGeometry(scenario.board_config)
interface_normal = np.array([0.0, 0.0, -1.0], dtype=np.float64)
reference_camera = camera_names[0]

# Generate synthetic detections from ground truth scenario
print("Generating synthetic detections...")
detections = generate_synthetic_detections(
    intrinsics=scenario.intrinsics,
    extrinsics=scenario.extrinsics,
    water_zs=scenario.water_zs,
    board=board,
    board_poses=scenario.board_poses,
    noise_std=scenario.noise_std,
    seed=42,
)

# Stage 2: Extrinsic initialization
print("Stage 2: Extrinsic initialization...")
pose_graph = build_pose_graph(detections, min_cameras=2)
initial_extrinsics = estimate_extrinsics(
    pose_graph, scenario.intrinsics, board, reference_camera
)

# Stage 3: Joint refractive optimization
print("Stage 3: Joint refractive optimization...")
opt_extrinsics, opt_distances, opt_poses, rms = optimize_interface(
    detections=detections,
    intrinsics=scenario.intrinsics,
    initial_extrinsics=initial_extrinsics,
    board=board,
    reference_camera=reference_camera,
    interface_normal=interface_normal,
    n_air=1.0,
    n_water=1.333,
    loss="huber",
    loss_scale=1.0,
    min_corners=4,
)

# Build CalibrationResult with per-camera RMS
diag_cameras = {}
per_camera_rms = {}
for cam_name in scenario.intrinsics:
    diag_cameras[cam_name] = CameraCalibration(
        name=cam_name,
        intrinsics=scenario.intrinsics[cam_name],
        extrinsics=opt_extrinsics[cam_name],
        water_z=opt_distances[cam_name],
    )
    cam_errors = compute_reprojection_errors(
        calibration=diag_cameras[cam_name],
        interface_params=InterfaceParams(normal=interface_normal, n_air=1.0, n_water=1.333),
        detections=detections,
        board=board,
    )
    per_camera_rms[cam_name] = np.sqrt(np.mean(cam_errors**2))

interface_params = InterfaceParams(normal=interface_normal, n_air=1.0, n_water=1.333)
diag_result = CalibrationResult(
    cameras=diag_cameras,
    interface=interface_params,
    board=scenario.board_config,
    diagnostics=DiagnosticsData(
        reprojection_error_rms=rms,
        reprojection_error_per_camera=per_camera_rms,
        validation_3d_error_mean=0.0,
        validation_3d_error_std=0.0,
    ),
    metadata=CalibrationMetadata(
        calibration_date="synthetic",
        software_version="test",
        config_hash="synthetic",
        num_frames_used=len(opt_poses),
        num_frames_holdout=0,
    ),
)

print(f"\nCalibration complete!")
print(f"  Overall RMS: {diag_result.diagnostics.reprojection_error_rms:.3f} px")

### Per-Camera Reprojection Error

A bar chart of per-camera RMS errors quickly reveals whether any camera is performing
worse than the others. Cameras with high error often have poor intrinsic calibration or
insufficient board observations.

In [None]:
# Per-camera RMS error bar chart
cam_names_sorted = sorted(diag_result.cameras.keys())
rms_values = [
    diag_result.diagnostics.reprojection_error_per_camera[cam] for cam in cam_names_sorted
]

fig, ax = plt.subplots(figsize=(10, 5))
ax.bar(cam_names_sorted, rms_values, color="steelblue", alpha=0.8)
ax.axhline(
    diag_result.diagnostics.reprojection_error_rms,
    color="red",
    linestyle="--",
    label=f"Overall RMS: {diag_result.diagnostics.reprojection_error_rms:.3f} px",
)
ax.set_xlabel("Camera")
ax.set_ylabel("RMS Reprojection Error (px)")
ax.set_title("Per-Camera Reprojection Error")
ax.legend()
ax.grid(axis="y", alpha=0.3)
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()
plt.close()

print("Cameras with error > 2x the average may need re-calibration.")

### Reprojection Error Distribution

The histogram shows the overall distribution of per-corner errors across all cameras
and frames. A well-calibrated rig produces a tight distribution centered near zero.

In [None]:
# Collect all reprojection error magnitudes
all_errors = []
for cam_name in cam_names_sorted:
    cam = diag_result.cameras[cam_name]
    errors = compute_reprojection_errors(
        calibration=cam,
        interface_params=diag_result.interface,
        detections=detections,
        board=board,
    )
    all_errors.extend(np.linalg.norm(errors, axis=1))

# Histogram of error distribution
fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(all_errors, bins=30, color="steelblue", alpha=0.7, edgecolor="black")
ax.axvline(
    np.mean(all_errors),
    color="red",
    linestyle="--",
    label=f"Mean: {np.mean(all_errors):.3f} px",
)
ax.set_xlabel("Reprojection Error (px)")
ax.set_ylabel("Frequency")
ax.set_title("Reprojection Error Distribution (All Cameras)")
ax.legend()
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()
plt.close()

print(f"Error statistics:")
print(f"  Mean:            {np.mean(all_errors):.3f} px")
print(f"  Median:          {np.median(all_errors):.3f} px")
print(f"  95th percentile: {np.percentile(all_errors, 95):.3f} px")
print(f"  Max:             {np.max(all_errors):.3f} px")

### Interface Distance Recovery

The water surface Z-coordinate (`interface_distance`) is the key refractive parameter.
Comparing estimated values to ground truth tells you how well Stage 3 converged.

In [None]:
# Compare estimated interface distances to ground truth
estimated_distances = [diag_result.cameras[cam].water_z for cam in cam_names_sorted]
gt_distances = [scenario.water_zs[cam] for cam in cam_names_sorted]

fig, ax = plt.subplots(figsize=(10, 5))
x = np.arange(len(cam_names_sorted))
width = 0.35

ax.bar(x - width / 2, estimated_distances, width, label="Estimated", color="steelblue", alpha=0.8)
ax.bar(x + width / 2, gt_distances, width, label="Ground Truth", color="orange", alpha=0.8)

ax.set_xlabel("Camera")
ax.set_ylabel("Interface Distance (m)")
ax.set_title("Interface Distance Recovery")
ax.set_xticks(x)
ax.set_xticklabels(cam_names_sorted, rotation=45, ha="right")
ax.legend()
ax.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()
plt.close()

mean_err = np.mean([abs(e - g) for e, g in zip(estimated_distances, gt_distances)])
print(f"Mean absolute interface distance error: {mean_err * 1000:.2f} mm")
print()
print("If estimated and ground-truth bars differ significantly, check that the initial")
print("interface distance estimate (initial_water_zs) is within 2-3x of the true value.")

### 3D Reconstruction Error

Reprojection error measures 2D fit quality, but 3D reconstruction error is the ultimate
accuracy metric. We triangulate board corners from multiple cameras and compare the
recovered inter-corner distances to the known ground truth.

In [None]:
from aquacal.validation.reconstruction import compute_3d_distance_errors

dist_errors = compute_3d_distance_errors(
    calibration=diag_result,
    detections=detections,
    board=board,
    include_per_pair=False,
    include_spatial=True,
)

print("3D Reconstruction Quality:")
print(f"  Signed mean error: {dist_errors.signed_mean * 1000:.3f} mm")
print(f"  RMSE:              {dist_errors.rmse * 1000:.3f} mm")
print(f"  Comparisons:       {dist_errors.num_comparisons}")

# Histogram of distance errors
if dist_errors.spatial is not None:
    fig, ax = plt.subplots(figsize=(8, 5))
    ax.hist(
        dist_errors.spatial.signed_errors * 1000,
        bins=30,
        color="steelblue",
        alpha=0.7,
        edgecolor="black",
    )
    ax.axvline(0, color="black", linestyle="--", alpha=0.5)
    ax.axvline(
        dist_errors.signed_mean * 1000,
        color="red",
        linestyle="--",
        label=f"Mean: {dist_errors.signed_mean * 1000:.2f} mm",
    )
    ax.set_xlabel("Signed Distance Error (mm)")
    ax.set_ylabel("Frequency")
    ax.set_title("3D Reconstruction Error Distribution")
    ax.legend()
    ax.grid(axis="y", alpha=0.3)
    plt.tight_layout()
    plt.show()
    plt.close()

### Common Issues Checklist

| Symptom | Likely Cause | Fix |
|---------|-------------|-----|
| High error for one camera | Poor intrinsic calibration | Re-calibrate intrinsics with more frames, check for motion blur |
| High error in image corners | Distortion model insufficient | Try rational model (8 coefficients) |
| Interface distance not converging | Initial estimate too far from truth | Provide better `initial_water_zs` in config |
| Interface distances differ between cameras | Degenerate board poses | Ensure board is visible at varied angles and depths |
| High 3D reconstruction error | Systematic bias | Check that `n_water` is correct for your water type |
| Reprojection < 1px but 3D error high | Overfitting to 2D | Verify interface distance initialization, add validation frames |

## Stage 4: Optional Intrinsic RefinementStage 4 optionally refines per-camera focal lengths (fx, fy) and principal points (cx, cy) alongside extrinsics and interface distances.**When to enable:** Only after Stage 3 converges reliably. Distortion coefficients are NOT refined.For synthetic data with perfect intrinsics, this stage would provide minimal benefit.

In [None]:
# Stage 4 is optional and disabled by default# Enable with refine_intrinsics=True in CalibrationConfigprint("Stage 4 skipped (refine_intrinsics=False)")print("Recommended for real hardware where intrinsics may have residual errors.")

## ValidationFor synthetic data, we can validate by comparing the calibration result to ground truth. For real data, validation uses held-out frames and 3D reconstruction error.Key metrics:- **Reprojection RMS**: How well the calibration predicts observed corner positions (pixels)- **3D reconstruction error**: How accurately pairwise 3D distances are recovered (meters or mm)

In [None]:
# For this synthetic example, reprojection error would be near-zero# (we're using ground truth parameters)print("Validation metrics (ground truth):")print("  Reprojection RMS: ~0.0 pixels (perfect calibration)")print("  3D reconstruction error: ~0.0 mm (perfect calibration)")print("\nFor real calibration pipelines:")print("  - Typical reprojection RMS: 0.3-1.0 pixels")print("  - Typical 3D error: 1-3 mm for ~30mm square size")

## Saving and Loading ResultsAquaCal provides utilities to save and load calibration results in JSON format.

In [None]:
# Save calibration resultfrom aquacal.io.serialization import save_calibration, load_calibration# In a real pipeline:# save_calibration(result, "output/calibration.json")# loaded_result = load_calibration("output/calibration.json")print("Calibration results can be saved to JSON for later use.")print("See aquacal.io.serialization.save_calibration() and load_calibration().")

## Summary

In this tutorial, you learned how to:

1. Load calibration data (synthetic, preset, or Zenodo)
2. Understand the calibration scenario structure
3. Visualize intrinsic parameters
4. Run and understand the four-stage calibration pipeline:
   - **Stage 1**: Intrinsic calibration (in-air)
   - **Stage 2**: Extrinsic initialization via pose graph
   - **Stage 3**: Joint refractive optimization
   - **Stage 4**: Optional intrinsic refinement
5. Diagnose calibration quality with reprojection and 3D error analysis
6. Save and load calibration results

**Next:** Explore [why refractive calibration matters](02_synthetic_validation.ipynb) with
controlled experiments comparing refractive vs non-refractive models.

- [User Guide](../guide/index.md): Comprehensive documentation on calibration theory and best practices