# Object Segmentation on 360 Images — Hands-On Demo

In this notebook, we'll use **[PanoSAM](https://github.com/yz3440/panosam)** to detect and segment objects in equirectangular panoramic images using **text prompts**.

PanoSAM wraps **[Meta SAM 3](https://ai.meta.com/sam3/)** (Segment Anything Model 3) and handles the full panorama pipeline — perspective splitting, per-view segmentation, coordinate conversion back to spherical, and mask deduplication — so you don't have to do any of that manually.

```mermaid
graph LR
    A["Equirectangular<br/>panorama"] --> B["Perspective Views<br/>(auto-generated)"] --> C["SAM 3 (per view)<br/>text prompt"] --> D["Deduplicated Masks<br/>in spherical coords"]
```

**Note:** SAM 3 is a large model and requires a HuggingFace token. A GPU is recommended for reasonable speed.

## 1. Install Dependencies

PanoSAM has install extras for different use cases:
- `panosam[sam3]` — includes SAM 3 engine dependencies (HuggingFace Transformers)
- `panosam[viz]` — visualization utilities
- `panosam[full]` — everything

In [None]:
!pip install "panosam[full]"

## 2. Load a Panorama

Place an equirectangular image at `assets/sample_panorama.jpg`, or the cell below will download one from Google Street View.

In [None]:
import os
from PIL import Image
import matplotlib.pyplot as plt

SAMPLE_IMAGE = "assets/sample_panorama.jpg"

if not os.path.exists(SAMPLE_IMAGE):
    print("No sample image found. Downloading from Google Street View...")
    from streetlevel import streetview
    pano = streetview.find_panorama(42.3625, -71.0862)  # MIT Media Lab area
    if pano:
        os.makedirs("assets", exist_ok=True)
        streetview.download_panorama(pano, SAMPLE_IMAGE, zoom=4)
        print(f"Downloaded panorama {pano.id}")

panorama = Image.open(SAMPLE_IMAGE)
print(f"Panorama size: {panorama.width} x {panorama.height}")

plt.figure(figsize=(16, 8))
plt.imshow(panorama)
plt.axis("off")
plt.title("Equirectangular Panorama")
plt.show()

## 3. Perspective Splitting (Handled by PanoSAM)

SAM 3 expects standard perspective images. PanoSAM handles the equirectangular-to-perspective projection automatically using configurable **perspective presets**:

| Preset | FOV | Resolution | Views |
|---|---|---|---|
| `DEFAULT` | 45° | 2048×2048 | 16 |
| `ZOOMED_IN` | 22.5° | 1024×1024 | 32 |
| `ZOOMED_OUT` | 60° | 2500×2500 | 12 |
| `WIDEANGLE` | 90° | 2500×2500 | 8 |

You can also create custom perspectives with `panosam.generate_perspectives()` for fine-grained control over FOV, resolution, overlap, and pitch angles.

Let's peek at what the perspective views look like:

In [None]:
import panosam as ps

# PanoSAM uses PanoramaImage to handle perspective generation
pano_image = ps.PanoramaImage(panorama_id="sample", image=panorama)

# Look at the DEFAULT preset perspectives
perspectives = ps.DEFAULT_IMAGE_PERSPECTIVES
print(f"DEFAULT preset: {len(perspectives)} perspectives")
for p in perspectives[:4]:
    print(f"  yaw={p.yaw_offset:6.1f}°, pitch={p.pitch_offset:5.1f}°, fov={p.horizontal_fov}°, res={p.width}×{p.height}")
print(f"  ... and {len(perspectives) - 4} more")

In [None]:
# Generate and display a few perspective views from the panorama
sample_perspectives = perspectives[:6]  # Show first 6

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
for idx, p in enumerate(sample_perspectives):
    persp_image = pano_image.generate_perspective_image(p)
    pil_view = persp_image.get_perspective_image()

    ax = axes[idx // 3][idx % 3]
    ax.imshow(pil_view)
    ax.set_title(f"Yaw: {p.yaw_offset:.0f}°, Pitch: {p.pitch_offset:.0f}°")
    ax.axis("off")

plt.suptitle("Perspective Views (auto-generated by PanoSAM)", fontsize=14)
plt.tight_layout()
plt.show()

## 4. Run PanoSAM Segmentation

PanoSAM handles the entire pipeline in one call:
1. Splits the panorama into perspective views (based on the chosen preset)
2. Runs SAM 3 on each view with your text prompt
3. Converts per-view masks to spherical coordinates
4. Deduplicates overlapping detections across views

**Prerequisites:**
- HuggingFace authentication: `huggingface-cli login` (accept the [SAM 3 license](https://huggingface.co/facebook/sam3))
- GPU recommended for reasonable speed

In [None]:
TEXT_PROMPT = "car"  # Try: "sign", "tree", "window", "person", "building"

print(f"Will segment '{TEXT_PROMPT}' across the full panorama.")

In [None]:
from panosam.engines.sam3 import SAM3Engine

try:
    # Initialize the SAM3 engine (downloads model on first run)
    engine = SAM3Engine()

    # Create the PanoSAM client with the WIDEANGLE preset (8 views, fast)
    # Use ps.PerspectivePreset.DEFAULT (16 views) for higher coverage
    client = ps.PanoSAM(engine=engine, views=ps.PerspectivePreset.WIDEANGLE)

    # Segment — this splits, runs SAM3, converts to spherical, and deduplicates
    result = client.segment(panorama, prompt=TEXT_PROMPT)

    print(f"Found {len(result.masks)} '{TEXT_PROMPT}' instance(s) across the panorama\n")
    for i, mask in enumerate(result.masks):
        print(f"  [{i}] score={mask.score:.2f}, "
              f"center=({mask.center_yaw:.1f}°, {mask.center_pitch:.1f}°), "
              f"polygons={len(mask.polygons)}")

except ImportError as e:
    print(f"Missing dependency: {e}")
    print("Install with: pip install 'panosam[sam3]'")
    result = None
except Exception as e:
    print(f"Error running PanoSAM: {e}")
    print("This may be due to missing HuggingFace auth or insufficient GPU memory.")
    print("Run: huggingface-cli login")
    result = None

## 5. Visualize Segmentation Results

PanoSAM returns masks in **spherical coordinates** (yaw/pitch in degrees), so we can visualize them directly on the equirectangular panorama.

PanoSAM includes `visualize_sphere_masks()` for overlaying spherical polygon masks onto panoramas.

In [None]:
if result and len(result.masks) > 0:
    # Use PanoSAM's built-in visualization
    viz = ps.visualize_sphere_masks(panorama, result.masks, alpha=0.5)

    plt.figure(figsize=(20, 10))
    plt.imshow(viz)
    plt.axis("off")
    plt.title(f"PanoSAM: '{TEXT_PROMPT}' — {len(result.masks)} instance(s) found", fontsize=14)
    plt.tight_layout()
    plt.show()
else:
    print("No segmentation results to visualize.")

## 6. Save Results as JSON

PanoSAM can export results in a JSON format compatible with the [PanoSAM Preview Tool](https://yz3440.github.io/panosam/) — an interactive 3D sphere viewer for exploring segmentation results.

In [None]:
if result:
    output_path = "assets/segmentation_result.panosam.json"
    os.makedirs("assets", exist_ok=True)
    result.save_json(output_path)
    print(f"Saved results to {output_path}")
    print(f"Open the PanoSAM preview tool and drag in this JSON + the panorama image:")
    print(f"  https://yz3440.github.io/panosam/")

## 7. Try Different Prompts

Change `TEXT_PROMPT` in cell 4 above and re-run cells 4–5 to try different objects.

Some ideas to try:
- `"car"` — vehicles on the street
- `"sign"` — street signs, shop signs
- `"tree"` — vegetation
- `"person"` — pedestrians
- `"window"` — building windows
- `"bicycle"` — bikes

## 8. Multi-Scale Segmentation

For objects of varying sizes, you can combine multiple presets. PanoSAM merges and deduplicates masks across all scales automatically.

In [None]:
# Optional: Multi-scale segmentation (combines ZOOMED_OUT + WIDEANGLE presets)
# Uncomment to run — this processes more views and takes longer

# multi_client = ps.PanoSAM(
#     engine=engine,
#     views=[ps.PerspectivePreset.ZOOMED_OUT, ps.PerspectivePreset.WIDEANGLE],
# )
# multi_result = multi_client.segment(panorama, prompt=TEXT_PROMPT)
# print(f"Multi-scale: found {len(multi_result.masks)} '{TEXT_PROMPT}' instance(s)")
# multi_result.save_json("assets/multi_scale_result.panosam.json")