<a href="https://colab.research.google.com/github/tlancaster6/AquaMVS/blob/main/docs/tutorial/notebook.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# End-to-End Reconstruction Tutorial

This tutorial walks through a complete multi-view stereo reconstruction workflow using the AquaMVS Python API. We start from synchronized multi-camera images, run the reconstruction pipeline, and produce a 3D surface mesh.

By the end of this tutorial, you will have:
- Loaded and inspected a pipeline configuration
- Executed the reconstruction pipeline
- Examined intermediate outputs (depth maps, consistency maps)
- Visualized the fused point cloud
- Exported the final mesh to various formats

In [1]:
# Install AquaMVS (run this cell in Colab; skip locally if already installed)
import importlib.util
import subprocess
import sys

if importlib.util.find_spec("aquamvs") is None:
    subprocess.run(
        [
            sys.executable,
            "-m",
            "pip",
            "install",
            "torch",
            "torchvision",
            "--index-url",
            "https://download.pytorch.org/whl/cpu",
            "-q",
        ],
        check=True,
    )
    subprocess.run(
        [
            sys.executable,
            "-m",
            "pip",
            "install",
            "git+https://github.com/cvg/LightGlue.git@edb2b83",
            "git+https://github.com/tlancaster6/RoMaV2.git",
            "aquamvs",
            "-q",
        ],
        check=True,
    )

In [2]:
import os
import urllib.request
import zipfile
from pathlib import Path

DATASET_URL = "https://github.com/tlancaster6/AquaMVS/releases/download/v0.1.0-example-data/aquamvs-example-dataset.zip"
DATASET_DIR = Path("aquamvs-example-dataset")

if not DATASET_DIR.exists():
    print("Downloading example dataset...")
    urllib.request.urlretrieve(DATASET_URL, "aquamvs-example-dataset.zip")
    with zipfile.ZipFile("aquamvs-example-dataset.zip") as zf:
        zf.extractall(DATASET_DIR)
    os.remove("aquamvs-example-dataset.zip")
    print("Done.")
else:
    print(f"Dataset already present at {DATASET_DIR}")

Dataset already present at aquamvs-example-dataset


## Setup and Imports

In [3]:
import logging
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np

from aquamvs import Pipeline, PipelineConfig

# Enable logging so pipeline stages print progress
logging.basicConfig(level=logging.INFO, format="%(name)s - %(message)s")

CONFIG_PATH = Path("aquamvs-example-dataset") / "config.yaml"

Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.


## 1. Load and Inspect Configuration

The pipeline configuration defines all parameters for reconstruction: camera paths, calibration, feature matching settings, depth estimation parameters, and output options.

In [4]:
# Load configuration from YAML
config = PipelineConfig.from_yaml(CONFIG_PATH)

# Inspect key parameters
print(f"Cameras: {list(config.camera_input_map.keys())}")
print(f"Output directory: {config.output_dir}")
print(f"Extractor type: {config.sparse_matching.extractor_type}")
print(f"Pipeline mode: {config.pipeline_mode}")
print(f"Device: {config.runtime.device}")

Cameras: ['e3v8250', 'e3v829d', 'e3v82e0', 'e3v831e', 'e3v832e', 'e3v8334', 'e3v83e9', 'e3v83eb', 'e3v83ee', 'e3v83ef', 'e3v83f0', 'e3v83f1']
Output directory: aquamvs-example-dataset/output
Extractor type: superpoint
Pipeline mode: full
Device: cuda


**Expected output:** List of camera names (e.g., `['e3v82e0', 'e3v82e1', ...]`), output directory path, extractor type (`'superpoint'`), pipeline mode (`'full'` or `'sparse'`), and device (`'cpu'` or `'cuda'`).

## 2. Run the Pipeline

The `Pipeline` class provides the primary programmatic interface. Calling `.run()` executes the full reconstruction workflow:

1. **Undistortion**: Apply camera calibration to remove lens distortion
2. **Feature Matching**: Extract and match features across camera pairs (LightGlue or RoMa)
3. **Triangulation**: Compute 3D points from feature correspondences (sparse mode) or...
4. **Plane Sweep Stereo**: Dense depth estimation via photometric cost volume (full mode)
5. **Depth Fusion**: Merge multi-view depth maps into a single point cloud
6. **Surface Reconstruction**: Generate a triangle mesh from the point cloud

This step can take a long time to run, depending on image resolution, number of cameras, and hardware.

In [5]:
import torch

torch.cuda.empty_cache()
# Initialize pipeline
pipeline = Pipeline(config)

# Run reconstruction
# This will process all frames according to config.preprocessing settings
pipeline.run()

aquamvs.pipeline.builder - Loading calibration from aquamvs-example-dataset/calibration.json
aquamvs.pipeline.builder - Ring cameras in calibration but missing input (skipped): ['e3v82f9']
aquamvs.pipeline.builder - Found 11 ring cameras, 1 auxiliary cameras (of 12/1 in calibration)
aquamvs.pipeline.builder - Computing undistortion maps
aquamvs.pipeline.builder - Creating projection models
aquamvs.pipeline.builder - Selecting camera pairs
aquamvs.pipeline.builder - Config saved to aquamvs-example-dataset\output\config.yaml
aquamvs.pipeline.runner - Detected image directory input
aquamvs.io - Detected 1 frames across 12 cameras (image directory input)
aquamvs.pipeline.runner - Processing frames 0 to end (step 1)
aquamvs.pipeline.stages.undistortion - Frame 0: undistorting images
aquamvs.pipeline.stages.undistortion - undistortion: 48.3 ms
aquamvs.pipeline.stages.dense_matching - Frame 0: running RoMa v2 dense matching (full mode)
Using cache found in C:\Users\tucke/.cache\torch\hub\face

aquamvs.pipeline.stages.dense_matching - Frame 0: converting RoMa warps to depth maps
aquamvs.pipeline.stages.dense_matching - dense_matching: 716815.0 ms
aquamvs.pipeline.stages.fusion - Frame 0: skipping geometric consistency filter (RoMa path)
aquamvs.pipeline.stages.fusion - Frame 0: fusing depth maps
aquamvs.pipeline.stages.fusion - Frame 0: removed 548933 outliers (4.4%) from fused cloud
aquamvs.pipeline.stages.fusion - fusion: 130389.7 ms
aquamvs.pipeline.stages.surface - Frame 0: reconstructing surface
aquamvs.pipeline.stages.surface - surface_reconstruction: 74898.2 ms
aquamvs.pipeline.runner - Frame 0: complete
aquamvs.pipeline.runner - Pipeline complete


**Expected output:** Per-stage log messages (undistortion, feature matching, triangulation, depth estimation, fusion, surface reconstruction) followed by `"Pipeline complete"`. You may also see a benign warning about ring cameras missing input if your dataset does not include all cameras from the calibration file.

## 3. Examine Intermediate Results

The pipeline saves intermediate outputs to the output directory, organized by frame. Let's load and visualize depth maps and consistency maps for frame 0.

In [None]:
from aquamvs import load_calibration_data

# Path to frame 0 output
output = Path(config.output_dir) / "frame_000000"

# Load calibration to identify ring cameras (auxiliary cameras don't produce depth maps)
calibration = load_calibration_data(config.calibration_path)
ring_cameras = [c for c in calibration.ring_cameras if c in config.camera_input_map]
cam = ring_cameras[0]

print(f"Ring cameras with input: {ring_cameras}")
print(f"Visualizing outputs for camera: {cam}")

### Depth Map

Depth maps represent the distance along each ray from the camera to the water surface. Values are in meters (ray depth, not world Z).

In [None]:
# Load depth map (saved as NPZ with 'depth' array)
depth_data = np.load(output / "depth_maps" / f"{cam}.npz")
depth = depth_data["depth"]

# Visualize
plt.figure(figsize=(12, 8))
plt.imshow(depth, cmap="viridis")
plt.colorbar(label="Depth (m)", shrink=0.8)
plt.title(f"Depth Map - {cam}")
plt.axis("off")
plt.tight_layout()
plt.show()

# Print statistics
valid_mask = ~np.isnan(depth)
print(f"Depth range: {np.nanmin(depth):.3f} to {np.nanmax(depth):.3f} m")
print(
    f"Valid pixels: {valid_mask.sum()} / {depth.size} ({100 * valid_mask.sum() / depth.size:.1f}%)"
)

### Consistency Map

Consistency maps indicate how many source cameras agree with the reference camera's depth estimate at each pixel. Higher values (warmer colors) indicate more reliable depth.

In [None]:
# Load consistency map
consistency_data = np.load(output / "consistency_maps" / f"{cam}.npz")
consistency = consistency_data["consistency"]

# Visualize
plt.figure(figsize=(12, 8))
plt.imshow(consistency, cmap="viridis")
plt.colorbar(label="Consistent views", shrink=0.8)
plt.title(f"Consistency Map - {cam}")
plt.axis("off")
plt.tight_layout()
plt.show()

# Print statistics
print(f"Consistency range: {consistency.min():.0f} to {consistency.max():.0f} views")
print(f"Mean consistency: {consistency[valid_mask].mean():.1f} views")

## 4. Visualize the Fused Point Cloud

The fusion stage merges all camera depth maps into a single 3D point cloud, saved as `fused.ply`.

In [None]:
import open3d as o3d

torch.cuda.empty_cache()
# Load fused point cloud
pcd_path = output / "point_cloud" / "fused.ply"
pcd = o3d.io.read_point_cloud(str(pcd_path))

print(f"Point cloud: {len(pcd.points)} points")
print(f"Has colors: {pcd.has_colors()}")
print(f"Has normals: {pcd.has_normals()}")

# Compute bounds
bbox = pcd.get_axis_aligned_bounding_box()
print(f"Bounding box: {bbox.get_extent()} m")

# Render an oblique view of the point cloud
vis = o3d.visualization.Visualizer()
vis.create_window(visible=False, width=1280, height=960)
vis.add_geometry(pcd)

# Initial render pass to initialize geometry bounds
vis.poll_events()
vis.update_renderer()

# Set oblique viewpoint (looking from above-front-right)
ctr = vis.get_view_control()
ctr.set_front([-0.3, -0.5, -0.8])  # oblique: slightly from front-right, mostly above
ctr.set_up([0, 0, -1])  # Z-down world: -Z is "up" on screen
ctr.set_lookat(np.asarray(bbox.get_center()))
ctr.set_zoom(0.5)

# Second render pass with the updated view
vis.poll_events()
vis.update_renderer()
img = np.asarray(vis.capture_screen_float_buffer(do_render=True))
vis.destroy_window()

plt.figure(figsize=(12, 9))
plt.imshow(img)
plt.title("Fused Point Cloud (oblique view)")
plt.axis("off")
plt.tight_layout()
plt.show()

**Note:** The above rendering uses Open3D's offscreen renderer, which requires a display (or virtual framebuffer on headless systems). If this cell fails, you can still inspect the point cloud by opening `fused.ply` directly in MeshLab, CloudCompare, or any PLY viewer.

## 5. Surface Reconstruction and Export

The surface reconstruction stage converts the point cloud into a triangle mesh. The default method is Poisson reconstruction, which produces a watertight mesh.

In [None]:
torch.cuda.empty_cache()
# Load reconstructed mesh
mesh_path = output / "mesh" / "surface.ply"
mesh = o3d.io.read_triangle_mesh(str(mesh_path))
mesh.compute_vertex_normals()

print(f"Mesh: {len(mesh.vertices)} vertices, {len(mesh.triangles)} triangles")
print(f"Has vertex colors: {mesh.has_vertex_colors()}")
print(f"Has vertex normals: {mesh.has_vertex_normals()}")

# Render an oblique view of the mesh
vis = o3d.visualization.Visualizer()
vis.create_window(visible=False, width=1280, height=960)
vis.add_geometry(mesh)

# Initial render pass to initialize geometry bounds
vis.poll_events()
vis.update_renderer()

# Set oblique viewpoint (must be set after the first poll/update)
ctr = vis.get_view_control()
mesh_bbox = mesh.get_axis_aligned_bounding_box()
ctr.set_front([-0.3, -0.5, -0.8])
ctr.set_up([0, 0, -1])
ctr.set_lookat(np.asarray(mesh_bbox.get_center()))
ctr.set_zoom(0.5)

# Second render pass with the updated view
vis.poll_events()
vis.update_renderer()
img = np.asarray(vis.capture_screen_float_buffer(do_render=True))
vis.destroy_window()

plt.figure(figsize=(12, 9))
plt.imshow(img)
plt.title("Reconstructed Surface Mesh (oblique view)")
plt.axis("off")
plt.tight_layout()
plt.show()

### Export to Other Formats

AquaMVS provides an `export_mesh` function to convert meshes to OBJ, STL, GLTF, or GLB formats with optional simplification.

In [None]:
from aquamvs import export_mesh

# Export to OBJ (widely supported, preserves colors)
obj_path = output / "surface.obj"
export_mesh(mesh_path, obj_path)
print(f"Exported to OBJ: {obj_path}")

# Export to STL with simplification (for 3D printing)
stl_path = output / "surface_simplified.stl"
export_mesh(mesh_path, stl_path, simplify=10000)
print(f"Exported simplified mesh to STL: {stl_path}")

# Export to GLB (compact, web-ready)
glb_path = output / "surface.glb"
export_mesh(mesh_path, glb_path)
print(f"Exported to GLB: {glb_path}")

## Next Steps

Now that you have completed a basic reconstruction, explore:

- **[CLI Guide](../cli_guide.md)**: Command-line workflow for batch processing
- **[Benchmarking Tutorial](benchmark)**: Compare LightGlue and RoMa reconstruction pathways with timing and quality metrics
- **[Troubleshooting Guide](../troubleshooting)**: If you encounter issues, see the troubleshooting guide
- **[Theory](../theory/index.rst)**: Understand the refractive geometry and algorithms
- **[API Reference](../api/index.rst)**: Detailed documentation of all modules and functions

### Configuration Tips

- **Switch matchers**: Set `matcher_type: "roma"` for dense matching (slower, more accurate)
- **Adjust depth range**: Modify `reconstruction.depth_min` and `depth_max` to focus on your region of interest
- **GPU acceleration**: Set `runtime.device: "cuda"` if you have a CUDA-capable GPU
- **Quality vs. speed**: Use `aquamvs init --preset fast` when initializing your configuration for a faster but lower-quality reconstruction
- **Increase quality**: Increase `reconstruction.num_depths` (default: 64) for higher quality at the cost of longer runtime

### Multi-Frame Reconstruction

To process multiple frames, adjust `preprocessing` settings in the config:

```yaml
preprocessing:
  frame_start: 0
  frame_stop: 100  # Process frames 0-99
  frame_step: 10   # Every 10th frame
```

Each frame's outputs will be saved to `output/frame_XXXXXX/`.