# OCR on 360 Images — Hands-On Demo

In this notebook, we'll run text recognition on a panoramic image using **[PanoOCR](https://github.com/yz3440/panoocr)** and explore the results.

PanoOCR handles the hard part automatically:
1. Splits the equirectangular panorama into overlapping perspective views
2. Runs an OCR engine on each view
3. Converts results to spherical coordinates (yaw/pitch)
4. Deduplicates overlapping detections

**Documentation:** [yz3440.github.io/panoocr](https://yz3440.github.io/panoocr/)

## 1. Install PanoOCR

Pick the right engine for your platform:
- **macOS**: `panoocr[macocr]` — uses Apple Vision Framework (fast, accurate)
- **Windows/Linux**: `panoocr[paddleocr]` — uses PaddleOCR

Uncomment the line for your platform below.

In [None]:
# macOS (Apple Vision Framework)
!pip install "panoocr[macocr]"

# Windows / Linux (PaddleOCR)
# !pip install "panoocr[paddleocr]"

## 2. Load a Sample Panorama

Place an equirectangular panorama in `assets/sample_panorama.jpg`, or use the cell below to download one from Google Street View using the `streetlevel` library.

In [None]:
import os
from PIL import Image
import matplotlib.pyplot as plt

SAMPLE_IMAGE = "assets/sample_panorama.jpg"

# If no sample image exists, download one from Google Street View
if not os.path.exists(SAMPLE_IMAGE):
    print("No sample image found. Downloading from Google Street View...")
    from streetlevel import streetview
    
    # Near MIT Media Lab — a spot with plenty of visible text
    pano = streetview.find_panorama(42.3625, -71.0862)
    if pano:
        os.makedirs("assets", exist_ok=True)
        streetview.download_panorama(pano, SAMPLE_IMAGE, zoom=4)
        print(f"Downloaded panorama {pano.id} ({pano.date})")
    else:
        print("Could not find a panorama. Please place an image at assets/sample_panorama.jpg")

# Display the panorama
img = Image.open(SAMPLE_IMAGE)
print(f"Image size: {img.size[0]} x {img.size[1]}")

plt.figure(figsize=(16, 8))
plt.imshow(img)
plt.axis("off")
plt.title("Sample Panorama (equirectangular)")
plt.tight_layout()
plt.show()

## 3. Set Up the OCR Engine

Choose the engine for your platform. The cell below defaults to MacOCR — change it if you're on Windows/Linux.

In [None]:
import platform

if platform.system() == "Darwin":
    # macOS — Apple Vision Framework
    from panoocr.engines.macocr import MacOCREngine
    engine = MacOCREngine()
    print("Using MacOCR (Apple Vision Framework)")
else:
    # Windows / Linux — PaddleOCR
    from panoocr.engines.paddleocr import PaddleOCREngine
    engine = PaddleOCREngine()
    print("Using PaddleOCR")

## 4. Run OCR on the Panorama

This is the main step. PanoOCR will:
- Generate perspective views from the equirectangular image
- Run the OCR engine on each view
- Convert and deduplicate results

This may take a minute depending on image size and your machine.

In [None]:
from panoocr import PanoOCR

pano_ocr = PanoOCR(engine)
result = pano_ocr.recognize(SAMPLE_IMAGE)

print(f"Found {len(result.results)} text detections")

## 5. Explore the Results

Each result has:
- `text` — the recognized text
- `yaw` — horizontal position in degrees (-180 to 180)
- `pitch` — vertical position in degrees (-90 to 90)
- `confidence` — OCR confidence score
- `width`, `height` — angular size in degrees

In [None]:
# Print the top 20 results sorted by confidence
sorted_results = sorted(result.results, key=lambda r: r.confidence, reverse=True)

print(f"{'Text':<30} {'Yaw':>6} {'Pitch':>6} {'Conf':>5}")
print("-" * 52)
for r in sorted_results[:20]:
    text_display = r.text[:28] + ".." if len(r.text) > 30 else r.text
    print(f"{text_display:<30} {r.yaw:>6.1f} {r.pitch:>6.1f} {r.confidence:>5.2f}")

## 6. Save Results as JSON

Save the results for use with the interactive 3D preview tool.

In [None]:
output_path = "assets/ocr_results.json"
result.save_json(output_path)
print(f"Results saved to {output_path}")

## 7. Visualize: Result Positions on the Panorama

Let's plot where each text detection sits on the panorama using yaw/pitch coordinates.

In [None]:
import matplotlib.pyplot as plt

yaws = [r.yaw for r in result.results]
pitches = [r.pitch for r in result.results]
confs = [r.confidence for r in result.results]

fig, ax = plt.subplots(figsize=(16, 8))
ax.imshow(img, extent=[-180, 180, -90, 90], aspect='auto', alpha=0.5)
scatter = ax.scatter(yaws, pitches, c=confs, cmap='viridis', s=20, alpha=0.8)
plt.colorbar(scatter, label='Confidence')
ax.set_xlabel('Yaw (degrees)')
ax.set_ylabel('Pitch (degrees)')
ax.set_title('OCR Detections on Panorama')
plt.tight_layout()
plt.show()

## 8. Interactive 3D Preview

For a much better visualization, use PanoOCR's interactive 3D preview tool:

```bash
# Clone panoocr and start a local server
git clone https://github.com/yz3440/panoocr.git
cd panoocr/preview
python -m http.server 8000
```

Then open `http://localhost:8000` in your browser and drag in:
1. The panorama image (`assets/sample_panorama.jpg`)
2. The JSON results file (`assets/ocr_results.json`)

You'll see the OCR results positioned on an interactive 3D sphere that you can rotate and zoom.