# üéØ Glimpse3D - Master Pipeline

## Complete 2D Image ‚Üí 3D Gaussian Splat Pipeline

This notebook runs the **entire Glimpse3D pipeline** end-to-end:

```
üì∑ Input Image
    ‚Üì
üî∑ TripoSR (0.5s) ‚Üí Initial 3D Mesh ‚Üí Gaussian Points
    ‚Üì
üé® SyncDreamer (2min) ‚Üí 16 Consistent Multi-View Images
    ‚Üì  
‚ú® SDXL Lightning + ControlNet ‚Üí Enhanced Views
    ‚Üì
üîÆ gsplat Optimization ‚Üí Refined Gaussians
    ‚Üì
üîÑ MVCRM ‚Üí Multi-View Consistent Refinement
    ‚Üì
üèÜ Final 3D Gaussian Splat Output
```

## Requirements
- Google Colab with **T4 GPU** (free tier) or **A100** (faster)
- ~12GB VRAM peak usage
- ~30 minutes total runtime

---

## üöÄ Quick Start

1. Run all cells in order (Runtime ‚Üí Run all)
2. Upload your image when prompted
3. Wait ~30 minutes for full pipeline
4. Download final results!

# Stage 0: Environment Setup

This stage:
1. Checks GPU availability and VRAM
2. Clones the Glimpse3D repository for `ai_modules` utilities
3. Installs all required dependencies
4. Creates the output directory structure

In [None]:
# Check environment
import sys
import os

IN_COLAB = 'google.colab' in sys.modules
print(f"üñ•Ô∏è Running in Colab: {IN_COLAB}")

if not IN_COLAB:
    print("‚ö†Ô∏è This notebook is designed for Google Colab!")
    print("   Some features may not work locally.")

# Check GPU
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv

import torch
print(f"\nüì¶ PyTorch: {torch.__version__}")
print(f"üî• CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"üîß CUDA Version: {torch.version.cuda}")

if torch.cuda.is_available():
    GPU_NAME = torch.cuda.get_device_name(0)
    GPU_VRAM = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"üéÆ GPU: {GPU_NAME}")
    print(f"üíæ VRAM: {GPU_VRAM:.1f} GB")
    
    # Set batch sizes based on GPU
    if GPU_VRAM >= 40:  # A100
        BATCH_VIEW_NUM = 8
        MC_RESOLUTION = 384
        NUM_SAMPLES = 200000
    elif GPU_VRAM >= 15:  # T4/V100
        BATCH_VIEW_NUM = 4
        MC_RESOLUTION = 256
        NUM_SAMPLES = 100000
    else:  # Lower VRAM
        BATCH_VIEW_NUM = 2
        MC_RESOLUTION = 192
        NUM_SAMPLES = 50000
    
    print(f"\n‚öôÔ∏è Settings: batch_view={BATCH_VIEW_NUM}, resolution={MC_RESOLUTION}, samples={NUM_SAMPLES}")
else:
    raise RuntimeError("‚ùå No GPU available! Enable GPU in Runtime ‚Üí Change runtime type")

In [None]:
# Clone Glimpse3D repository (for ai_modules utilities)
GLIMPSE3D_REPO = "/content/Glimpse-3D"

if not os.path.exists(GLIMPSE3D_REPO):
    print("üì• Cloning Glimpse3D repository...")
    !git clone https://github.com/varunaditya27/Glimpse3D.git {GLIMPSE3D_REPO}
else:
    print(f"‚úÖ Glimpse3D repo already exists at {GLIMPSE3D_REPO}")

# Add to Python path for ai_modules
sys.path.insert(0, GLIMPSE3D_REPO)
print(f"‚úÖ Added {GLIMPSE3D_REPO} to Python path")

## üì¶ Installing Dependencies

**Strategy:**
1. Install compatible NumPy + scipy first (avoids version conflicts)
2. Install ML frameworks (transformers, diffusers, etc.)
3. Install 3D processing packages (trimesh, rembg, gsplat)
4. Install model-specific packages (CLIP, etc.)

**Expected warnings:** Pip may show version conflicts for opencv/tensorflow/gradio - these are **harmless** because our pipeline doesn't use those pre-installed Colab packages.

In [None]:
%%capture install_output
# Install all dependencies - carefully pinned for Google Colab compatibility
# Note: Colab (Jan 2026) has NumPy 2.x, PyTorch 2.9+, CUDA 12.6
# We work WITH these defaults, not against them.

print("üì¶ Installing dependencies (this takes ~5 minutes)...")

# ===============================================
# STRATEGY: Pin NumPy + scipy first to ensure compatibility
# Then install everything else WITHOUT allowing numpy downgrades
# Ignore pip warnings about opencv/tensorflow/gradio - we don't use them!
# ===============================================

# ‚ö†Ô∏è CRITICAL FIX: Install compatible NumPy + scipy FIRST
# NumPy 2.1.x is well-supported and stable
# scipy 1.14+ has good NumPy 2.x support
print("üìå Step 1: Installing NumPy + scipy with compatible versions...")
!pip install "numpy>=2.1,<2.2" "scipy>=1.14" --quiet

# Verify installation succeeded
import numpy as np
import scipy
print(f"‚úÖ NumPy: {np.__version__}, scipy: {scipy.__version__}")

# Core ML packages - pin to stable versions that work with NumPy 2.x
print("üìå Step 2: Installing ML frameworks...")
!pip install transformers>=4.44.0 diffusers>=0.30.0 accelerate huggingface_hub safetensors --quiet
!pip install omegaconf einops pytorch-lightning>=2.0.0 kornia --quiet

# TripoSR dependencies 
# CRITICAL: trimesh>=4.4.0 is required for NumPy 2.0 compatibility (ptp fix)
print("üìå Step 3: Installing 3D processing packages...")
!pip install "trimesh>=4.4.0" rembg[gpu] xatlas plyfile --quiet

# torchmcubes for marching cubes
!pip install git+https://github.com/tatsy/torchmcubes.git --quiet

# gsplat - builds JIT, compatible with Colab's PyTorch
print("üìå Step 4: Installing gsplat...")
!pip install gsplat --quiet

# SyncDreamer dependencies  
# Pin CLIP to specific commit for stability
print("üìå Step 5: Installing SyncDreamer dependencies...")
!pip install git+https://github.com/openai/CLIP.git@a1d071733d7111c9c014f024669f959182114e33 --quiet
!pip install taming-transformers-rom1504 --quiet

# Image processing - use Colab's scikit-image, just ensure imageio
print("üìå Step 6: Installing image processing...")
!pip install imageio imageio-ffmpeg --quiet

# Depth estimation (MiDaS)
!pip install timm --quiet

print("\n‚úÖ All dependencies installed!")
print("‚ö†Ô∏è Pip warnings about opencv/tensorflow/gradio can be ignored - we don't use those packages.")

In [None]:
# Verify key dependencies are correctly installed
print("üîç Verifying critical dependencies...\n")

import numpy as np
print(f"‚úÖ NumPy: {np.__version__}")

import torch
print(f"‚úÖ PyTorch: {torch.__version__}")
print(f"   CUDA available: {torch.cuda.is_available()}")

# Verify scipy is compatible with numpy (this was causing the _center import error)
import scipy
print(f"‚úÖ scipy: {scipy.__version__}")
# Quick test to ensure scipy.sparse works
try:
    import scipy.sparse
    print("   scipy.sparse: ‚úÖ OK")
except ImportError as e:
    raise RuntimeError(f"‚ùå scipy is NOT compatible with NumPy! Error: {e}")

import trimesh
print(f"‚úÖ trimesh: {trimesh.__version__}")
# Verify trimesh works with NumPy 2.x (ptp fix check)
try:
    test_mesh = trimesh.creation.box()
    _ = test_mesh.bounds  # This uses np.ptp internally in old versions
    print("   NumPy 2.x compatibility: ‚úÖ OK")
except AttributeError as e:
    if 'ptp' in str(e):
        raise RuntimeError("‚ùå trimesh is NOT compatible with NumPy 2.x! Please install trimesh>=4.4.0")
    raise

import transformers
print(f"‚úÖ transformers: {transformers.__version__}")

import diffusers
print(f"‚úÖ diffusers: {diffusers.__version__}")

# Test rembg (background removal) - this depends on scipy working
try:
    import rembg
    print(f"‚úÖ rembg: installed")
except ImportError as e:
    print(f"‚ùå rembg import failed: {e}")
    raise

# Test plyfile
try:
    from plyfile import PlyData
    print(f"‚úÖ plyfile: installed")
except ImportError:
    print("‚ùå plyfile not found!")

print("\nüéâ All critical dependencies verified!")

In [None]:
# Create directory structure
from pathlib import Path
import gc

WORK_DIR = Path("/content/glimpse3d_pipeline")
WORK_DIR.mkdir(exist_ok=True)

DIRS = {
    'input': WORK_DIR / 'input',
    'triposr': WORK_DIR / 'stage1_triposr',
    'syncdreamer': WORK_DIR / 'stage2_syncdreamer',
    'enhanced': WORK_DIR / 'stage3_enhanced',
    'gsplat': WORK_DIR / 'stage4_gsplat',
    'mvcrm': WORK_DIR / 'stage5_mvcrm',
    'output': WORK_DIR / 'final_output',
}

for name, path in DIRS.items():
    path.mkdir(exist_ok=True)
    print(f"üìÅ {name}: {path}")

def clear_gpu():
    """Aggressively clear GPU memory between stages."""
    # Multiple gc passes for thorough cleanup
    for _ in range(3):
        gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.synchronize()
    allocated = torch.cuda.memory_allocated() / 1024**3
    reserved = torch.cuda.memory_reserved() / 1024**3
    print(f"üßπ GPU memory cleared. Allocated: {allocated:.2f} GB, Reserved: {reserved:.2f} GB")

def safe_del(obj_name, globals_dict):
    """Safely delete an object if it exists."""
    if obj_name in globals_dict and globals_dict[obj_name] is not None:
        del globals_dict[obj_name]
        
print("\n‚úÖ Directory structure created!")

# Stage 1: Upload Input Image

In [None]:
from google.colab import files
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

print("üì§ Upload your image (JPG/PNG):")
uploaded = files.upload()

# Save uploaded file
INPUT_FILENAME = list(uploaded.keys())[0]
INPUT_PATH = DIRS['input'] / INPUT_FILENAME

with open(INPUT_PATH, 'wb') as f:
    f.write(list(uploaded.values())[0])

# Display
input_image = Image.open(INPUT_PATH)
plt.figure(figsize=(8, 8))
plt.imshow(input_image)
plt.title(f"Input: {INPUT_FILENAME} ({input_image.size[0]}x{input_image.size[1]})")
plt.axis('off')
plt.show()

print(f"\n‚úÖ Saved to: {INPUT_PATH}")

# Stage 2: TripoSR - Initial 3D Reconstruction

**Input:** Single image  
**Output:** 3D mesh + Gaussian point cloud  
**Time:** ~30 seconds

In [None]:
# Clone TripoSR
import sys
import os
from pathlib import Path

TRIPOSR_PATH = Path("/content/TripoSR")

if not TRIPOSR_PATH.exists():
    print("üì• Cloning TripoSR...")
    !git clone https://github.com/VAST-AI-Research/TripoSR.git {TRIPOSR_PATH}

sys.path.insert(0, str(TRIPOSR_PATH))
os.chdir(TRIPOSR_PATH)
print(f"‚úÖ TripoSR ready at {TRIPOSR_PATH}")

In [None]:
import time
import torch
import numpy as np
from PIL import Image
from tsr.system import TSR
from tsr.utils import remove_background, resize_foreground
import rembg

print("\n" + "="*60)
print("üî∑ STAGE 2: TripoSR 3D Reconstruction")
print("="*60)

device = "cuda:0"

# Load model
print("\nüì• Loading TripoSR model...")
triposr_model = TSR.from_pretrained(
    "stabilityai/TripoSR",
    config_name="config.yaml",
    weight_name="model.ckpt",
)
triposr_model.renderer.set_chunk_size(8192)
triposr_model.to(device)
print("‚úÖ Model loaded!")

# Preprocess image
print("\nüîß Preprocessing image...")
input_img = Image.open(INPUT_PATH)
rembg_session = rembg.new_session()
processed_img = remove_background(input_img, rembg_session)
processed_img = resize_foreground(processed_img, 0.85)

# ‚úÖ FIXED: Save RGBA for SyncDreamer, create RGB for TripoSR
# SyncDreamer's prepare_inputs needs RGBA with alpha channel
processed_img.save(DIRS['triposr'] / "processed_input.png")

# Create RGB version for TripoSR (it expects 3 channels)
img_np = np.array(processed_img).astype(np.float32) / 255.0
img_np_rgb = img_np[:, :, :3] * img_np[:, :, 3:4] + (1 - img_np[:, :, 3:4]) * 0.5
processed_img_rgb = Image.fromarray((img_np_rgb * 255.0).astype(np.uint8))
processed_img_rgb.save(DIRS['triposr'] / "processed_input_rgb.png")

# Run inference
print("\nüöÄ Running TripoSR...")
start_time = time.time()

# ‚ö†Ô∏è CRITICAL: Use RGB version for TripoSR (it expects 3 channels, not 4)
with torch.no_grad():
    scene_codes = triposr_model([processed_img_rgb], device=device)
    meshes = triposr_model.extract_mesh(scene_codes, has_vertex_color=True, resolution=MC_RESOLUTION)

mesh = meshes[0]
elapsed = time.time() - start_time

print(f"\n‚úÖ Mesh generated in {elapsed:.2f}s")
print(f"   Vertices: {len(mesh.vertices):,}")
print(f"   Faces: {len(mesh.faces):,}")

# Save mesh - OBJ always works, GLB may fail with NumPy compatibility issues
mesh.export(str(DIRS['triposr'] / "mesh.obj"))
print(f"‚úÖ Saved OBJ: {DIRS['triposr'] / 'mesh.obj'}")

try:
    mesh.export(str(DIRS['triposr'] / "mesh.glb"))
    print(f"‚úÖ Saved GLB: {DIRS['triposr'] / 'mesh.glb'}")
except Exception as e:
    print(f"‚ö†Ô∏è GLB export failed (NumPy compatibility): {e}")
    print(f"   Continuing with OBJ format only")

print(f"\nüìÅ Mesh saved to {DIRS['triposr']}")

In [None]:
# Convert mesh to Gaussian PLY
import gc
import numpy as np
from plyfile import PlyData, PlyElement

def mesh_to_gaussian_ply(mesh, output_path, num_samples=100000):
    """Convert mesh to Gaussian Splat format with better initialization."""
    print(f"\nüîÑ Sampling {num_samples:,} points...")
    
    points, face_indices = mesh.sample(num_samples, return_index=True)
    
    if mesh.visual.vertex_colors is not None:
        face_vertices = mesh.faces[face_indices]
        vertex_colors = mesh.visual.vertex_colors[:, :3] / 255.0
        colors = vertex_colors[face_vertices].mean(axis=1)
    else:
        colors = np.ones((num_samples, 3)) * 0.5
    
    num_points = len(points)
    xyz = points.astype(np.float32)
    
    # ‚ö†Ô∏è IMPORTANT: Estimate appropriate scale based on point density
    # Calculate average nearest neighbor distance to set scale
    from scipy.spatial import cKDTree
    tree = cKDTree(xyz[:min(10000, len(xyz))])  # Use subset for speed
    distances, _ = tree.query(xyz[:min(10000, len(xyz))], k=2)  # k=2 to get nearest neighbor (excluding self)
    avg_nn_distance = distances[:, 1].mean()  # Second column is nearest neighbor
    
    print(f"   Average point spacing: {avg_nn_distance:.4f}")
    
    # Color encoding: Convert RGB to SH DC coefficient
    C0 = 0.28209479177387814
    features_dc = ((colors - 0.5) / C0).astype(np.float32)
    features_rest = np.zeros((num_points, 45), dtype=np.float32)
    
    # ‚ö†Ô∏è CRITICAL FIX: Better initial parameters
    # Opacity: sigmoid(2.0) ‚âà 0.88, good starting point (visible but not saturated)
    opacities = np.ones((num_points, 1), dtype=np.float32) * 2.0
    
    # ‚ö†Ô∏è CRITICAL FIX: Scale based on actual point spacing
    # exp(scale_raw) = actual_scale, so scale_raw = log(actual_scale)
    # We want Gaussians to slightly overlap, so use 1.5x the average spacing
    target_scale = avg_nn_distance * 1.5
    scale_raw = np.log(max(target_scale, 0.001))  # Prevent log(0)
    print(f"   Initial Gaussian scale: {target_scale:.4f} (raw={scale_raw:.2f})")
    
    scales = np.ones((num_points, 3), dtype=np.float32) * scale_raw
    
    # ‚úÖ gsplat uses wxyz quaternion convention: rot_0=w, rot_1=x, rot_2=y, rot_3=z
    rotations = np.zeros((num_points, 4), dtype=np.float32)
    rotations[:, 0] = 1.0  # w=1 (identity rotation in wxyz format)
    
    dtype_full = [
        ('x', 'f4'), ('y', 'f4'), ('z', 'f4'),
        ('f_dc_0', 'f4'), ('f_dc_1', 'f4'), ('f_dc_2', 'f4'),
    ]
    for i in range(45):
        dtype_full.append((f'f_rest_{i}', 'f4'))
    dtype_full.extend([
        ('opacity', 'f4'),
        ('scale_0', 'f4'), ('scale_1', 'f4'), ('scale_2', 'f4'),
        ('rot_0', 'f4'), ('rot_1', 'f4'), ('rot_2', 'f4'), ('rot_3', 'f4'),
    ])
    
    elements = np.zeros(num_points, dtype=dtype_full)
    elements['x'] = xyz[:, 0]
    elements['y'] = xyz[:, 1]
    elements['z'] = xyz[:, 2]
    elements['f_dc_0'] = features_dc[:, 0]
    elements['f_dc_1'] = features_dc[:, 1]
    elements['f_dc_2'] = features_dc[:, 2]
    for i in range(45):
        elements[f'f_rest_{i}'] = features_rest[:, i]
    elements['opacity'] = opacities[:, 0]
    elements['scale_0'] = scales[:, 0]
    elements['scale_1'] = scales[:, 1]
    elements['scale_2'] = scales[:, 2]
    elements['rot_0'] = rotations[:, 0]
    elements['rot_1'] = rotations[:, 1]
    elements['rot_2'] = rotations[:, 2]
    elements['rot_3'] = rotations[:, 3]
    
    el = PlyElement.describe(elements, 'vertex')
    PlyData([el]).write(output_path)
    print(f"‚úÖ Saved: {output_path}")
    
    # Print summary statistics
    print(f"\nüìä Gaussian Initialization Summary:")
    print(f"   Points: {num_points:,}")
    print(f"   Position range: X[{xyz[:,0].min():.3f}, {xyz[:,0].max():.3f}]")
    print(f"   Scale (exp): {np.exp(scale_raw):.4f}")
    print(f"   Opacity (sigmoid): {1/(1+np.exp(-2.0)):.3f}")


# Stage 3: SyncDreamer - Multi-View Generation

**Input:** Processed image  
**Output:** 16 consistent multi-view images  
**Time:** ~2-3 minutes

In [None]:
print("\n" + "="*60)
print("üé® STAGE 3: SyncDreamer Multi-View Generation")
print("="*60)

# Clone SyncDreamer
SYNCDREAMER_PATH = Path("/content/SyncDreamer")

if not SYNCDREAMER_PATH.exists():
    print("üì• Cloning SyncDreamer...")
    !git clone https://github.com/liuyuan-pal/SyncDreamer.git {SYNCDREAMER_PATH}

# Download checkpoints
CKPT_DIR = SYNCDREAMER_PATH / "ckpt"
CKPT_DIR.mkdir(exist_ok=True)

!apt -y install -qq aria2

CHECKPOINTS = {
    "syncdreamer-pretrain.ckpt": "https://huggingface.co/camenduru/SyncDreamer/resolve/main/syncdreamer-pretrain.ckpt",
    "ViT-L-14.pt": "https://huggingface.co/camenduru/SyncDreamer/resolve/main/ViT-L-14.pt"
}

for fname, url in CHECKPOINTS.items():
    fpath = CKPT_DIR / fname
    if not fpath.exists():
        print(f"üì• Downloading {fname}...")
        !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M "{url}" -d "{CKPT_DIR}" -o "{fname}"
    else:
        print(f"‚úÖ {fname} exists")

sys.path.insert(0, str(SYNCDREAMER_PATH))
os.chdir(SYNCDREAMER_PATH)

In [None]:
import gc
import torch
from pathlib import Path
from omegaconf import OmegaConf
from ldm.util import instantiate_from_config

# ‚ö†Ô∏è CRITICAL: Aggressive memory cleanup before loading SyncDreamer
# T4 has limited RAM (~12GB on Colab free tier)
print("üßπ Clearing memory before loading SyncDreamer...")
gc.collect()
gc.collect()
gc.collect()
torch.cuda.empty_cache()
torch.cuda.synchronize()

# Check available memory
import psutil
ram_available = psutil.virtual_memory().available / 1024**3
print(f"   Available RAM: {ram_available:.1f} GB")
if ram_available < 5:
    print("   ‚ö†Ô∏è Low RAM warning! Consider restarting runtime.")

# Load SyncDreamer model
print("\nüì• Loading SyncDreamer model...")

config_path = SYNCDREAMER_PATH / "configs" / "syncdreamer.yaml"
config = OmegaConf.load(config_path)

# Instantiate model from config
syncdreamer_model = instantiate_from_config(config.model)

# Load pretrained weights with MEMORY-MAPPED approach
ckpt_path = CKPT_DIR / "syncdreamer-pretrain.ckpt"
print(f"   Loading checkpoint: {ckpt_path}")

# ‚ö†Ô∏è MEMORY FIX: Use mmap=True to avoid loading entire file into RAM
# This memory-maps the file instead of loading it all at once
try:
    # Try mmap first (PyTorch 2.1+)
    print("   Using memory-mapped loading (mmap=True)...")
    checkpoint = torch.load(ckpt_path, map_location="cpu", mmap=True, weights_only=False)
except TypeError:
    # Fallback for older PyTorch without mmap support
    print("   Fallback: Standard loading...")
    checkpoint = torch.load(ckpt_path, map_location="cpu", weights_only=False)

# Extract state dict
if "state_dict" in checkpoint:
    state_dict = checkpoint["state_dict"]
else:
    state_dict = checkpoint

# Load weights
print("   Loading state dict into model...")
missing, unexpected = syncdreamer_model.load_state_dict(state_dict, strict=False)

# ‚ö†Ô∏è CRITICAL: Delete checkpoint from RAM immediately
print("   Cleaning up checkpoint from RAM...")
del checkpoint
del state_dict
gc.collect()
gc.collect()

if missing:
    print(f"   ‚ö†Ô∏è Missing keys: {len(missing)}")
if unexpected:
    print(f"   ‚ö†Ô∏è Unexpected keys: {len(unexpected)}")

# Check RAM after loading
ram_after = psutil.virtual_memory().available / 1024**3
print(f"   RAM after loading: {ram_after:.1f} GB")

# Move to GPU
print("   Moving model to GPU...")
syncdreamer_model = syncdreamer_model.cuda().eval()

# Final cleanup
gc.collect()
torch.cuda.empty_cache()

# Verify model is ready
print(f"\n‚úÖ SyncDreamer loaded!")
print(f"   Model type: {type(syncdreamer_model).__name__}")
print(f"   GPU memory: {torch.cuda.memory_allocated()/1024**3:.1f} GB allocated")

In [None]:
from ldm.models.diffusion.sync_dreamer import SyncDDIMSampler
from ldm.util import prepare_inputs  # CRITICAL: Use official data preparation

# ‚úÖ FIXED: Camera configuration MUST match SyncDreamer training data
# SyncDreamer generates 16 views at FIXED 30¬∞ elevation, azimuths spaced 22.5¬∞ apart
ELEVATIONS = [30.0] * 16  # All 16 views at 30¬∞ elevation
AZIMUTHS = [i * 22.5 for i in range(16)]  # 0¬∞, 22.5¬∞, 45¬∞, ..., 337.5¬∞
RADIUS = 1.5

# ‚úÖ FIXED: Use official prepare_inputs function for proper data preparation
# This handles alpha channel, CLIP embedding, and proper normalization
processed_path = DIRS['triposr'] / "processed_input.png"

INPUT_ELEVATION = 30.0  # Assume front view at 30 degrees
CROP_SIZE = 200         # Crop foreground to this size

print(f"üì∏ Preparing input: {processed_path}")
print(f"   Input elevation: {INPUT_ELEVATION}¬∞")
print(f"   Crop size: {CROP_SIZE}")

# Verify image has alpha channel (required by prepare_inputs)
img_check = Image.open(str(processed_path))
print(f"   Image format: {img_check.mode} (channels: {len(img_check.getbands())})")
if img_check.mode != 'RGBA':
    print("   ‚ö†Ô∏è Converting to RGBA...")
    img_check = img_check.convert('RGBA')
    img_check.save(str(processed_path))
img_check.close()

# Use official SyncDreamer data preparation
data = prepare_inputs(str(processed_path), INPUT_ELEVATION, CROP_SIZE)

# Move to GPU and add batch dimension
for k, v in data.items():
    data[k] = v.unsqueeze(0).cuda()
    print(f"   {k}: {data[k].shape}")

print(f"\n‚úÖ Input prepared using official prepare_inputs()")

In [None]:
# Run SyncDreamer inference
print("\nüöÄ Running SyncDreamer (this takes ~2-3 minutes)...")
start_time = time.time()

# Settings
SAMPLE_STEPS = 50
CFG_SCALE = 2.0

sampler = SyncDDIMSampler(syncdreamer_model, SAMPLE_STEPS)

try:
    with torch.no_grad():
        # ‚úÖ FIXED: Data already prepared correctly by prepare_inputs()
        # Run synchronized multi-view generation
        x_sample = syncdreamer_model.sample(
            sampler, 
            data, 
            CFG_SCALE, 
            BATCH_VIEW_NUM
        )
        # x_sample shape: (B, N, C, H, W) where N=16 views
        
except RuntimeError as e:
    if "out of memory" in str(e).lower():
        print("‚ö†Ô∏è OOM Error! Reducing batch size and retrying...")
        clear_gpu()
        BATCH_VIEW_NUM = max(1, BATCH_VIEW_NUM // 2)
        print(f"   New BATCH_VIEW_NUM: {BATCH_VIEW_NUM}")
        
        sampler = SyncDDIMSampler(syncdreamer_model, SAMPLE_STEPS)
        with torch.no_grad():
            x_sample = syncdreamer_model.sample(sampler, data, CFG_SCALE, BATCH_VIEW_NUM)
    else:
        raise

elapsed = time.time() - start_time
print(f"\n‚úÖ SyncDreamer completed in {elapsed/60:.1f} minutes")
print(f"   Output shape: {x_sample.shape}")

In [None]:
# Save multi-view images
print("\nüíæ Saving multi-view images...")

# Convert samples to images: [-1,1] -> [0,1]
samples = (x_sample.clamp(-1, 1) + 1) / 2

syncdreamer_views = []

B, N, C, H, W = samples.shape
print(f"   Processing {N} views at {H}x{W}")

for i in range(N):
    img_tensor = samples[0, i]  # (C, H, W)
    img_np = (img_tensor.permute(1, 2, 0).cpu().numpy() * 255).astype(np.uint8)
    img_pil = Image.fromarray(img_np)
    
    # Save individual view
    elev = int(ELEVATIONS[i])
    azim = int(AZIMUTHS[i])
    save_path = DIRS['syncdreamer'] / f"view_{i:02d}_e{elev}_a{azim}.png"
    img_pil.save(save_path)
    syncdreamer_views.append(img_pil)

print(f"‚úÖ Saved {len(syncdreamer_views)} views to {DIRS['syncdreamer']}")

# Display grid
fig, axes = plt.subplots(4, 4, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
    ax.imshow(syncdreamer_views[i])
    ax.set_title(f"E={int(ELEVATIONS[i])}¬∞ A={int(AZIMUTHS[i])}¬∞", fontsize=8)
    ax.axis('off')
plt.suptitle("SyncDreamer: 16 Multi-View Images", fontsize=14)
plt.tight_layout()
plt.savefig(DIRS['syncdreamer'] / "grid.png", dpi=150)
plt.show()

# Cleanup SyncDreamer to free VRAM
del syncdreamer_model, sampler, x_sample, samples
clear_gpu()

# Stage 4: SDXL Enhancement (Optional)

**Input:** Multi-view images  
**Output:** Enhanced multi-view images  
**Time:** ~1 minute per image

Skip this stage if you want faster results.

In [None]:
SKIP_ENHANCEMENT = False  # Set to True to skip this stage

# ‚ö†Ô∏è CRITICAL: Check RAM BEFORE attempting to load SDXL
# SDXL UNet alone requires ~5-6GB RAM to load, plus existing Python overhead
import gc
import psutil

# Aggressive cleanup first
for _ in range(10):
    gc.collect()
torch.cuda.empty_cache()
torch.cuda.synchronize()

ram_available = psutil.virtual_memory().available / 1024**3
ram_total = psutil.virtual_memory().total / 1024**3

print(f"\nüìä Memory Status:")
print(f"   Available RAM: {ram_available:.1f} GB / {ram_total:.1f} GB")
print(f"   SDXL UNet requires: ~6 GB RAM to load")

# Auto-skip if RAM is insufficient (need ~8GB free to be safe)
if ram_available < 7:
    print(f"\n‚ö†Ô∏è AUTO-SKIP: Insufficient RAM for SDXL ({ram_available:.1f} GB < 7 GB)")
    print("   T4 GPUs on Colab free tier often have limited RAM.")
    print("   Skipping SDXL enhancement to prevent crash.")
    SKIP_ENHANCEMENT = True

if not SKIP_ENHANCEMENT:
    print("\n" + "="*60)
    print("‚ú® STAGE 4: SDXL Lightning Enhancement")
    print("="*60)
    
    # Save view list to disk temporarily, then delete from RAM
    import pickle
    views_cache_path = DIRS['syncdreamer'] / "_views_cache.pkl"
    with open(views_cache_path, 'wb') as f:
        pickle.dump(syncdreamer_views, f)
    del syncdreamer_views
    
    # Delete any lingering objects
    for var_name in ['data', 'x_sample', 'samples', 'sampler']:
        try:
            exec(f'del {var_name}')
        except:
            pass
    
    # Aggressive garbage collection
    for _ in range(10):
        gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.synchronize()
    
    from diffusers import StableDiffusionXLImg2ImgPipeline, AutoencoderKL, EulerDiscreteScheduler
    from huggingface_hub import hf_hub_download
    from safetensors.torch import load_file
    
    print("\nüì• Loading SDXL Lightning...")
    print("   Using single-file LoRA method (lower RAM usage)")
    
    base_model = "stabilityai/stable-diffusion-xl-base-1.0"
    repo = "ByteDance/SDXL-Lightning"
    
    try:
        # ‚úÖ MEMORY-OPTIMIZED: Load pipeline with low_cpu_mem_usage
        print("   Loading base pipeline (this may take a minute)...")
        pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
            base_model,
            torch_dtype=torch.float16,
            variant="fp16",
            use_safetensors=True,
            low_cpu_mem_usage=True,  # ‚ö†Ô∏è CRITICAL: Reduces RAM during loading
        )
        
        # Apply fp16-fixed VAE
        print("   Loading fp16-fixed VAE...")
        pipe.vae = AutoencoderKL.from_pretrained(
            "madebyollin/sdxl-vae-fp16-fix",
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
        )
        
        # Download and load Lightning LoRA (much smaller than full UNet swap)
        print("   Applying Lightning LoRA weights...")
        lora_path = hf_hub_download(repo, "sdxl_lightning_4step_lora.safetensors")
        pipe.load_lora_weights(lora_path)
        pipe.fuse_lora()  # Fuse for faster inference
        
        # Use correct scheduler for Lightning
        pipe.scheduler = EulerDiscreteScheduler.from_config(
            pipe.scheduler.config, 
            timestep_spacing="trailing"
        )
        
        # Move to GPU
        pipe = pipe.to("cuda")
        
        # Cleanup
        gc.collect()
        torch.cuda.empty_cache()
        
        print("‚úÖ SDXL Lightning loaded (LoRA method)!")
        print("   ‚ö†Ô∏è Remember: guidance_scale MUST be 0 for Lightning")
        
        # Reload syncdreamer_views from cache
        print("   Loading cached views...")
        with open(views_cache_path, 'rb') as f:
            syncdreamer_views = pickle.load(f)
        views_cache_path.unlink()  # Delete cache file
        
    except Exception as e:
        print(f"‚ùå Failed to load SDXL Lightning: {e}")
        print("   Falling back to skip enhancement")
        SKIP_ENHANCEMENT = True
        
        # Reload syncdreamer_views if enhancement failed
        try:
            with open(views_cache_path, 'rb') as f:
                syncdreamer_views = pickle.load(f)
            views_cache_path.unlink()
        except:
            # Reload from disk as fallback
            syncdreamer_views = []
            for i in range(16):
                img_path = DIRS['syncdreamer'] / f"view_{i:02d}_e30_a{int(i*22.5)}.png"
                syncdreamer_views.append(Image.open(img_path))
else:
    print("\n‚è≠Ô∏è Skipping SDXL enhancement stage")
    print("   Using original SyncDreamer views (still high quality!)")
    
    # Make sure syncdreamer_views is available for next stage
    if 'syncdreamer_views' not in dir() or syncdreamer_views is None:
        print("   Reloading views from disk...")
        syncdreamer_views = []
        for i in range(16):
            img_path = DIRS['syncdreamer'] / f"view_{i:02d}_e30_a{int(i*22.5)}.png"
            syncdreamer_views.append(Image.open(img_path))

In [None]:
if not SKIP_ENHANCEMENT:
    # Enhance select views (not all 16 to save time)
    VIEWS_TO_ENHANCE = [0, 4, 8, 12]  # Only 4 views to reduce memory pressure
    
    print(f"\nüöÄ Enhancing {len(VIEWS_TO_ENHANCE)} views...")
    
    enhanced_views = syncdreamer_views.copy()  # Start with original
    
    prompt = "highly detailed 3D render, professional studio lighting, sharp textures, photorealistic, 8k quality"
    negative_prompt = "blurry, low quality, artifacts, noise, watermark, text"
    
    for i, view_idx in enumerate(VIEWS_TO_ENHANCE):
        print(f"  Enhancing view {view_idx} ({i+1}/{len(VIEWS_TO_ENHANCE)})...")
        
        # Resize input for SDXL (works best at 512-1024)
        input_img = syncdreamer_views[view_idx].resize((512, 512), Image.LANCZOS)
        
        with torch.no_grad():
            result = pipe(
                prompt=prompt,
                negative_prompt=negative_prompt,
                image=input_img,
                strength=0.35,  # Lower = preserve more original structure
                num_inference_steps=4,  # Lightning uses 4 steps
                guidance_scale=0,  # Lightning uses CFG=0
            ).images[0]
        
        # Resize back to match SyncDreamer output size
        result_resized = result.resize((256, 256), Image.LANCZOS)
        enhanced_views[view_idx] = result_resized
        result.save(DIRS['enhanced'] / f"enhanced_{view_idx:02d}.png")
        
        # Clear VRAM between images
        torch.cuda.empty_cache()
    
    print(f"\n‚úÖ Enhanced views saved to {DIRS['enhanced']}")
    
    # Cleanup SDXL to free VRAM for gsplat
    del pipe
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.synchronize()
    print("üßπ SDXL cleaned up")
else:
    # Use original SyncDreamer views (already high quality)
    enhanced_views = syncdreamer_views
    print("‚úÖ Using original SyncDreamer views (no enhancement)")
    print("   Note: SyncDreamer views are already high quality for gsplat optimization")

# Stage 5: gsplat Optimization

**Input:** Initial Gaussian PLY + Multi-view images  
**Output:** Optimized Gaussian Splats  
**Time:** ~5 minutes

In [None]:
print("\n" + "="*60)
print("üîÆ STAGE 5: gsplat Optimization")
print("="*60)

import torch.nn as nn
from gsplat import rasterization
import math

device = torch.device("cuda:0")

# ‚ö†Ô∏è PRE-COMPILE GSPLAT CUDA KERNELS
# gsplat uses JIT compilation - first call takes 5-10 minutes on T4
# We do a dummy render here so the compilation happens with a clear progress message
print("\n‚è≥ Pre-compiling gsplat CUDA kernels...")
print("   This takes 5-10 minutes on first run (one-time per session)")
print("   You'll see 'Setting up CUDA...' - this is normal!")

# Dummy tensors for compilation trigger
_dummy_means = torch.zeros(100, 3, device=device)
_dummy_quats = torch.tensor([[1, 0, 0, 0]] * 100, dtype=torch.float32, device=device)
_dummy_scales = torch.ones(100, 3, device=device) * 0.01
_dummy_opacities = torch.ones(100, device=device)
_dummy_colors = torch.ones(100, 3, device=device)
_dummy_viewmat = torch.eye(4, device=device).unsqueeze(0)
_dummy_K = torch.tensor([[128, 0, 64], [0, 128, 64], [0, 0, 1]], dtype=torch.float32, device=device).unsqueeze(0)

try:
    _ = rasterization(
        means=_dummy_means,
        quats=_dummy_quats,
        scales=_dummy_scales,
        opacities=_dummy_opacities,
        colors=_dummy_colors,
        viewmats=_dummy_viewmat,
        Ks=_dummy_K,
        width=128,
        height=128,
        packed=False,
        render_mode="RGB",
    )
    print("‚úÖ gsplat CUDA kernels compiled!")
except Exception as e:
    print(f"‚ö†Ô∏è Pre-compilation note: {e}")

# Cleanup dummy tensors
del _dummy_means, _dummy_quats, _dummy_scales, _dummy_opacities, _dummy_colors, _dummy_viewmat, _dummy_K
torch.cuda.empty_cache()

# Image size for rendering (matches SyncDreamer output)
IMAGE_SIZE = 256

# Load initial Gaussians from PLY
def load_gaussian_ply(path):
    """Load Gaussian parameters from PLY file."""
    plydata = PlyData.read(path)
    vertex = plydata['vertex']
    
    xyz = np.stack([vertex['x'], vertex['y'], vertex['z']], axis=-1)
    f_dc = np.stack([vertex['f_dc_0'], vertex['f_dc_1'], vertex['f_dc_2']], axis=-1)
    
    # Load f_rest if present
    f_rest_names = [f'f_rest_{i}' for i in range(45)]
    available_f_rest = [name for name in f_rest_names if name in vertex.data.dtype.names]
    if available_f_rest:
        f_rest = np.stack([vertex[name] for name in available_f_rest], axis=-1)
    else:
        f_rest = np.zeros((len(xyz), 45), dtype=np.float32)
    
    # ‚úÖ FIXED: Ensure opacity is 1D (N,) - gsplat requires this shape
    opacity = np.asarray(vertex['opacity'], dtype=np.float32).flatten()
    scales = np.stack([vertex['scale_0'], vertex['scale_1'], vertex['scale_2']], axis=-1)
    rotations = np.stack([vertex['rot_0'], vertex['rot_1'], vertex['rot_2'], vertex['rot_3']], axis=-1)
    
    return {
        'xyz': torch.tensor(xyz, dtype=torch.float32),
        'f_dc': torch.tensor(f_dc, dtype=torch.float32),
        'f_rest': torch.tensor(f_rest, dtype=torch.float32),
        'opacity': torch.tensor(opacity, dtype=torch.float32),  # Shape: (N,)
        'scales': torch.tensor(scales, dtype=torch.float32),
        'rotations': torch.tensor(rotations, dtype=torch.float32),
    }

gaussians = load_gaussian_ply(str(INITIAL_PLY_PATH))
print(f"\n‚úÖ Loaded {len(gaussians['xyz']):,} Gaussians")

In [None]:
class GaussianModel(nn.Module):
    def __init__(self, gaussians):
        super().__init__()
        self.xyz = nn.Parameter(gaussians['xyz'].clone())
        self.f_dc = nn.Parameter(gaussians['f_dc'].clone())
        self.f_rest = nn.Parameter(gaussians['f_rest'].clone())
        # ‚úÖ FIXED: Ensure opacity is 1D (N,) - gsplat requires this shape
        opacity_tensor = gaussians['opacity'].clone()
        if opacity_tensor.dim() > 1:
            opacity_tensor = opacity_tensor.squeeze(-1)
        self.opacity_raw = nn.Parameter(opacity_tensor)
        self.scales_raw = nn.Parameter(gaussians['scales'].clone())
        self.rotations = nn.Parameter(gaussians['rotations'].clone())
        
    @property
    def opacity(self):
        # ‚úÖ FIXED: Returns (N,) shape - required by gsplat.rasterization()
        return torch.sigmoid(self.opacity_raw)
    
    @property
    def scales(self):
        return torch.exp(self.scales_raw)
    
    def get_colors(self):
        C0 = 0.28209479177387814
        return 0.5 + C0 * self.f_dc
    
    def forward(self):
        return {
            'xyz': self.xyz,
            'colors': self.get_colors(),
            'opacity': self.opacity,
            'scales': self.scales,
            'rotations': self.rotations / (self.rotations.norm(dim=-1, keepdim=True) + 1e-8),
        }

model = GaussianModel(gaussians).to(device)
print(f"‚úÖ Model: {sum(p.numel() for p in model.parameters()):,} parameters")

In [None]:
# Camera system matching SyncDreamer conventions
# SyncDreamer uses: Y-up, camera looks at origin, radius ~1.5

# ‚ö†Ô∏è CRITICAL: First check where the Gaussians actually are
with torch.no_grad():
    xyz = model.xyz.cpu().numpy()
    gaussian_center = xyz.mean(axis=0)
    gaussian_extent = (xyz.max(axis=0) - xyz.min(axis=0)).max()
    
print(f"üìä Gaussian cloud analysis:")
print(f"   Center: ({gaussian_center[0]:.3f}, {gaussian_center[1]:.3f}, {gaussian_center[2]:.3f})")
print(f"   Extent: {gaussian_extent:.3f}")

# ‚ö†Ô∏è FIX: If Gaussians are not at origin, either:
# 1. Recenter the Gaussians (recommended)
# 2. Or adjust camera look_at point

# Option 1: Recenter Gaussians to origin (better approach)
if np.linalg.norm(gaussian_center) > 0.1:  # If center is more than 0.1 from origin
    print(f"\nüîß Recentering Gaussians to origin...")
    with torch.no_grad():
        model.xyz.data -= torch.tensor(gaussian_center, device=device, dtype=torch.float32)
    gaussian_center = np.array([0.0, 0.0, 0.0])
    print(f"   New center: (0, 0, 0)")

# Adjust radius based on actual model size
# Camera should be ~2.5x the model extent away for good framing
RADIUS = max(gaussian_extent * 2.5, 1.5)
print(f"   Camera radius: {RADIUS:.3f}")

def create_camera_pose(elevation_deg, azimuth_deg, radius=1.5, look_at=None):
    """Create world-to-camera matrix for given elevation and azimuth."""
    if look_at is None:
        look_at = np.array([0, 0, 0])
    
    elev = math.radians(elevation_deg)
    azim = math.radians(azimuth_deg)
    
    # Camera position in spherical coordinates (Y-up convention)
    x = radius * math.cos(elev) * math.sin(azim)
    y = radius * math.sin(elev)
    z = radius * math.cos(elev) * math.cos(azim)
    
    cam_pos = np.array([x, y, z]) + look_at  # Offset by look_at point
    up = np.array([0, 1, 0])  # Y-up
    
    # Construct camera basis
    forward = look_at - cam_pos
    forward = forward / (np.linalg.norm(forward) + 1e-8)
    right = np.cross(forward, up)
    right = right / (np.linalg.norm(right) + 1e-8)
    up_new = np.cross(right, forward)
    
    # World-to-camera transformation
    # R rotates world to camera, t translates
    w2c = np.eye(4, dtype=np.float32)
    w2c[0, :3] = right
    w2c[1, :3] = up_new
    w2c[2, :3] = -forward  # Camera looks along -Z
    w2c[:3, 3] = -w2c[:3, :3] @ cam_pos
    
    return w2c

def get_intrinsics(fov_deg=49.1, image_size=256):
    """Get camera intrinsics matrix. FOV ~49.1 matches SyncDreamer."""
    fov_rad = math.radians(fov_deg)
    focal = image_size / (2 * math.tan(fov_rad / 2))
    
    K = np.array([
        [focal, 0, image_size / 2],
        [0, focal, image_size / 2],
        [0, 0, 1]
    ], dtype=np.float32)
    return K

# Pre-compute all camera poses (using SyncDreamer camera parameters)
# Now looking at actual Gaussian center
camera_poses = [create_camera_pose(e, a, radius=RADIUS, look_at=gaussian_center) 
                for e, a in zip(ELEVATIONS, AZIMUTHS)]
intrinsics = get_intrinsics(fov_deg=49.1, image_size=IMAGE_SIZE)

print(f"\n‚úÖ Created {len(camera_poses)} camera poses")
print(f"   Image size: {IMAGE_SIZE}x{IMAGE_SIZE}")
print(f"   Intrinsics: focal={intrinsics[0,0]:.1f}, center=({intrinsics[0,2]:.0f}, {intrinsics[1,2]:.0f})")

# ‚ö†Ô∏è DEBUG: Verify camera can see the Gaussians
with torch.no_grad():
    test_render, test_alpha = render_gaussians(model, camera_poses[0], intrinsics, IMAGE_SIZE)
    test_mean = test_render.mean().item()
    print(f"\nüîç Camera verification render:")
    print(f"   Mean brightness: {test_mean:.4f}")
    if test_mean < 0.01:
        print("   ‚ö†Ô∏è Very dark - may need scale/opacity adjustment")
    else:
        print("   ‚úÖ Gaussians are visible to camera")

In [None]:
def render_gaussians(model, w2c, K, image_size):
    """Render Gaussian splats from a camera viewpoint."""
    params = model()
    
    viewmat = torch.tensor(w2c, dtype=torch.float32, device=device)
    K_tensor = torch.tensor(K, dtype=torch.float32, device=device)
    
    try:
        render_colors, render_alphas, info = rasterization(
            means=params['xyz'],
            quats=params['rotations'],
            scales=params['scales'],
            opacities=params['opacity'],
            colors=params['colors'],
            viewmats=viewmat.unsqueeze(0),
            Ks=K_tensor.unsqueeze(0),
            width=image_size,
            height=image_size,
            packed=False,
            render_mode="RGB",
        )
        return render_colors[0], render_alphas[0]
    except Exception as e:
        print(f"Render error: {e}")
        # Return empty image on error
        return torch.zeros(image_size, image_size, 3, device=device), torch.zeros(image_size, image_size, 1, device=device)

In [None]:
from tqdm import tqdm
import torch.nn.functional as F

# ============================================================
# ‚ö†Ô∏è CRITICAL: Verify setup before optimization
# ============================================================

# Check target images
print("üîç Pre-optimization Diagnostics:")
print("\nüìä Target Images:")
sample_view = enhanced_views[0]
print(f"   Type: {type(sample_view)}")
print(f"   Mode: {sample_view.mode}")
print(f"   Size: {sample_view.size}")
img_array = np.array(sample_view)
print(f"   Array shape: {img_array.shape}")
print(f"   Value range: [{img_array.min()}, {img_array.max()}]")
print(f"   Mean value: {img_array.mean():.1f}")

# Check Gaussian initialization
with torch.no_grad():
    xyz = model.xyz.cpu().numpy()
    colors = model.get_colors().cpu().numpy()
    opacity = model.opacity.cpu().numpy()
    scales = model.scales.cpu().numpy()
    
print(f"\nüìä Initial Gaussians:")
print(f"   Points: {len(xyz):,}")
print(f"   XYZ center: ({xyz.mean(0)[0]:.3f}, {xyz.mean(0)[1]:.3f}, {xyz.mean(0)[2]:.3f})")
print(f"   XYZ extent: {xyz.max() - xyz.min():.3f}")
print(f"   Opacity: [{opacity.min():.4f}, {opacity.max():.4f}] (mean={opacity.mean():.4f})")
print(f"   Scales: [{scales.min():.6f}, {scales.max():.6f}] (mean={scales.mean():.6f})")
print(f"   Colors: [{colors.min():.4f}, {colors.max():.4f}] (mean={colors.mean():.4f})")

print(f"\nüìä Camera Setup:")
print(f"   Radius: {RADIUS}")
print(f"   Image size: {IMAGE_SIZE}")
print(f"   Num views: {len(camera_poses)}")

# ============================================================
# Prepare target images as tensors
# ============================================================
target_tensors = []
for img in enhanced_views:
    img_resized = img.resize((IMAGE_SIZE, IMAGE_SIZE), Image.LANCZOS)
    # ‚ö†Ô∏è FIX: Ensure RGB (3 channels), not RGBA
    if img_resized.mode == 'RGBA':
        img_resized = img_resized.convert('RGB')
    img_tensor = torch.tensor(np.array(img_resized) / 255.0, dtype=torch.float32, device=device)
    # Ensure shape is [H, W, 3]
    if img_tensor.dim() == 3 and img_tensor.shape[-1] == 3:
        target_tensors.append(img_tensor)
    else:
        print(f"‚ö†Ô∏è Warning: Unexpected tensor shape {img_tensor.shape}")
        # Force to 3 channels
        target_tensors.append(img_tensor[..., :3])

print(f"\n‚úÖ Prepared {len(target_tensors)} target images at {IMAGE_SIZE}x{IMAGE_SIZE}")
print(f"   Target tensor shape: {target_tensors[0].shape}")
print(f"   Target value range: [{target_tensors[0].min():.3f}, {target_tensors[0].max():.3f}]")

# ============================================================
# ‚ö†Ô∏è TEST RENDER: Check if Gaussians are visible BEFORE training
# ============================================================
print("\nüîç Test render BEFORE optimization...")
with torch.no_grad():
    test_render, test_alpha = render_gaussians(model, camera_poses[0], intrinsics, IMAGE_SIZE)
    test_np = test_render.detach().cpu().numpy()
    print(f"   Rendered shape: {test_np.shape}")
    print(f"   Rendered range: [{test_np.min():.4f}, {test_np.max():.4f}]")
    print(f"   Rendered mean: {test_np.mean():.4f}")
    
    if test_np.max() < 0.01:
        print("\n   ‚ö†Ô∏è WARNING: Initial render is nearly BLACK!")
        print("   This means cameras are not seeing the Gaussians.")
        print("   Possible causes:")
        print("     1. Gaussians are too small (scales)")
        print("     2. Gaussians are invisible (opacity)")
        print("     3. Camera is not pointing at Gaussians")
        
        # Try to fix by adjusting scale
        print("\n   üîß Attempting to fix: Increasing initial scales...")
        model.scales_raw.data += 1.0  # Increase scale by e^1 ‚âà 2.7x
        
        # Re-test
        test_render2, _ = render_gaussians(model, camera_poses[0], intrinsics, IMAGE_SIZE)
        test_np2 = test_render2.detach().cpu().numpy()
        print(f"   After scale fix - render mean: {test_np2.mean():.4f}")
        
        if test_np2.max() < 0.01:
            print("   ‚ö†Ô∏è Still black - increasing opacity...")
            model.opacity_raw.data += 2.0  # Higher starting opacity
            
            test_render3, _ = render_gaussians(model, camera_poses[0], intrinsics, IMAGE_SIZE)
            test_np3 = test_render3.detach().cpu().numpy()
            print(f"   After opacity fix - render mean: {test_np3.mean():.4f}")
    else:
        print("   ‚úÖ Initial render looks good!")

# Show test render vs target
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
axes[0].imshow(test_render.detach().cpu().numpy())
axes[0].set_title(f'Initial Render (mean={test_render.mean():.3f})')
axes[0].axis('off')
axes[1].imshow(target_tensors[0].cpu().numpy())
axes[1].set_title(f'Target (mean={target_tensors[0].mean():.3f})')
axes[1].axis('off')
plt.tight_layout()
plt.savefig(DIRS['gsplat'] / "render_vs_target.png")
plt.show()

# ============================================================
# Optimizer with CONSERVATIVE learning rates
# ============================================================
# ‚ö†Ô∏è Key insight: Start with smaller LRs to prevent divergence
optimizer = torch.optim.Adam([
    {'params': model.xyz, 'lr': 1e-5, 'name': 'xyz'},           # Very small - positions are already good
    {'params': model.f_dc, 'lr': 1e-3, 'name': 'f_dc'},         # Color is important
    {'params': model.f_rest, 'lr': 1e-3 / 20, 'name': 'f_rest'},
    {'params': model.opacity_raw, 'lr': 1e-2, 'name': 'opacity'},  # Moderate
    {'params': model.scales_raw, 'lr': 1e-3, 'name': 'scales'},    # Small - scales are sensitive
    {'params': model.rotations, 'lr': 1e-4, 'name': 'rotations'},
])

# Gentler decay
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.998)

# ============================================================
# Training loop with better monitoring
# ============================================================
NUM_ITERATIONS = 1000
losses = []
best_loss = float('inf')
best_state = None

print(f"\nüöÄ Starting optimization for {NUM_ITERATIONS} iterations...")
pbar = tqdm(range(NUM_ITERATIONS))

for iteration in pbar:
    optimizer.zero_grad()
    
    # Sample random view
    view_idx = np.random.randint(0, 16)
    w2c = camera_poses[view_idx]
    target = target_tensors[view_idx]
    
    # Render
    rendered, alpha = render_gaussians(model, w2c, intrinsics, IMAGE_SIZE)
    
    # ‚ö†Ô∏è Check for NaN/Inf early
    if torch.isnan(rendered).any() or torch.isinf(rendered).any():
        print(f"\n‚ö†Ô∏è NaN/Inf detected at iteration {iteration}! Stopping...")
        break
    
    # L1 loss (more stable than MSE)
    l1_loss = F.l1_loss(rendered, target)
    loss = l1_loss
    
    loss.backward()
    
    # ‚ö†Ô∏è Check gradients
    grad_norm = 0
    for p in model.parameters():
        if p.grad is not None:
            grad_norm += p.grad.norm().item() ** 2
    grad_norm = grad_norm ** 0.5
    
    # Skip update if gradients are crazy
    if grad_norm > 100:
        print(f"\n‚ö†Ô∏è Large gradient ({grad_norm:.1f}) at iter {iteration}, skipping...")
        optimizer.zero_grad()
        continue
    
    # Gradient clipping for stability
    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
    
    optimizer.step()
    scheduler.step()
    
    losses.append(loss.item())
    
    # Save best model
    if loss.item() < best_loss:
        best_loss = loss.item()
        best_state = {k: v.clone() for k, v in model.state_dict().items()}
    
    if iteration % 100 == 0:
        with torch.no_grad():
            render_mean = rendered.mean().item()
        pbar.set_postfix({
            'loss': f'{loss.item():.4f}', 
            'view': view_idx,
            'render': f'{render_mean:.3f}',
            'grad': f'{grad_norm:.2f}'
        })

# Restore best model if training went bad
if losses[-1] > losses[0] * 1.1:  # Final loss > 110% of initial
    print(f"\n‚ö†Ô∏è Training may have diverged (loss went up). Restoring best model...")
    if best_state is not None:
        model.load_state_dict(best_state)
        print(f"   Restored to best loss: {best_loss:.4f}")

print(f"\n‚úÖ Optimization complete!")
print(f"   Initial loss: {losses[0]:.4f}")
print(f"   Best loss: {best_loss:.4f}")
print(f"   Final loss: {losses[-1]:.4f}")

# Test render after optimization
with torch.no_grad():
    final_render, _ = render_gaussians(model, camera_poses[0], intrinsics, IMAGE_SIZE)
    print(f"   Final render mean: {final_render.mean():.4f}")

# Plot loss curve
plt.figure(figsize=(10, 4))
plt.plot(losses)
plt.axhline(y=best_loss, color='g', linestyle='--', label=f'Best: {best_loss:.4f}')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('gsplat Optimization Loss')
plt.legend()
plt.grid(True)
plt.savefig(DIRS['gsplat'] / "loss_curve.png")
plt.show()

In [None]:
# Save optimized model
def save_gaussian_ply(model, output_path):
    with torch.no_grad():
        params = model()
        xyz = params['xyz'].cpu().numpy()
        colors = model.f_dc.cpu().numpy()
        f_rest = model.f_rest.cpu().numpy()
        opacity = model.opacity_raw.cpu().numpy()
        scales = model.scales_raw.cpu().numpy()
        rotations = params['rotations'].cpu().numpy()
        
    num_points = len(xyz)
    dtype_full = [('x', 'f4'), ('y', 'f4'), ('z', 'f4'),
                  ('f_dc_0', 'f4'), ('f_dc_1', 'f4'), ('f_dc_2', 'f4')]
    for i in range(f_rest.shape[1]):
        dtype_full.append((f'f_rest_{i}', 'f4'))
    dtype_full.extend([('opacity', 'f4'),
                       ('scale_0', 'f4'), ('scale_1', 'f4'), ('scale_2', 'f4'),
                       ('rot_0', 'f4'), ('rot_1', 'f4'), ('rot_2', 'f4'), ('rot_3', 'f4')])
    
    elements = np.zeros(num_points, dtype=dtype_full)
    elements['x'] = xyz[:, 0]
    elements['y'] = xyz[:, 1]
    elements['z'] = xyz[:, 2]
    elements['f_dc_0'] = colors[:, 0]
    elements['f_dc_1'] = colors[:, 1]
    elements['f_dc_2'] = colors[:, 2]
    for i in range(f_rest.shape[1]):
        elements[f'f_rest_{i}'] = f_rest[:, i]
    elements['opacity'] = opacity
    elements['scale_0'] = scales[:, 0]
    elements['scale_1'] = scales[:, 1]
    elements['scale_2'] = scales[:, 2]
    elements['rot_0'] = rotations[:, 0]
    elements['rot_1'] = rotations[:, 1]
    elements['rot_2'] = rotations[:, 2]
    elements['rot_3'] = rotations[:, 3]
    
    el = PlyElement.describe(elements, 'vertex')
    PlyData([el]).write(output_path)

OPTIMIZED_PLY_PATH = DIRS['gsplat'] / "optimized_gaussian.ply"
save_gaussian_ply(model, str(OPTIMIZED_PLY_PATH))
print(f"‚úÖ Saved optimized Gaussians: {OPTIMIZED_PLY_PATH}")

# ‚ö†Ô∏è CRITICAL DIAGNOSTICS: Check if optimization went wrong
print("\nüîç Model Parameter Diagnostics:")

with torch.no_grad():
    opacity_values = model.opacity.cpu().numpy()
    scale_values = model.scales.cpu().numpy()
    color_values = model.get_colors().cpu().numpy()
    
    print(f"\nüìä Opacity (sigmoid(opacity_raw)):")
    print(f"   Range: [{opacity_values.min():.6f}, {opacity_values.max():.6f}]")
    print(f"   Mean: {opacity_values.mean():.6f}")
    print(f"   Median: {np.median(opacity_values):.6f}")
    print(f"   % > 0.5: {(opacity_values > 0.5).sum() / len(opacity_values) * 100:.1f}%")
    
    print(f"\nüìä Scales (exp(scales_raw)):")
    print(f"   Range: [{scale_values.min():.6f}, {scale_values.max():.6f}]")
    print(f"   Mean: {scale_values.mean():.6f}")
    print(f"   Median: {np.median(scale_values):.6f}")
    
    print(f"\nüìä Colors (0.5 + C0 * f_dc):")
    print(f"   Range: [{color_values.min():.6f}, {color_values.max():.6f}]")
    print(f"   Mean: {color_values.mean():.6f}")
    print(f"   % gray (< 0.1): {(color_values < 0.1).sum() / color_values.size * 100:.1f}%")

# ‚ö†Ô∏è RECOVERY: If model is completely degenerate, reset to initial
if opacity_values.max() < 0.01 or scale_values.max() < 0.001 or color_values.max() < 0.01:
    print("\n‚ö†Ô∏è CRITICAL: Model appears to have degenerated during optimization!")
    print("   Resetting to initial Gaussians for video rendering...")
    
    # Reload initial model
    gaussians_initial = load_gaussian_ply(str(INITIAL_PLY_PATH))
    model = GaussianModel(gaussians_initial).to(device)
    print("‚úÖ Model reset to initial state")
    
    # Double-check new model
    with torch.no_grad():
        opacity_check = model.opacity.cpu().numpy()
        print(f"   Opacity check: [{opacity_check.min():.4f}, {opacity_check.max():.4f}]")
else:
    print("\n‚úÖ Model parameters appear reasonable for rendering")

# Stage 6: Generate Final Outputs

In [None]:
print("\n" + "="*60)
print("üèÜ FINAL OUTPUT GENERATION")
print("="*60)

import imageio

# ‚ö†Ô∏è DEBUG: Check actual Gaussian positions to set correct camera
with torch.no_grad():
    xyz = model.xyz.cpu().numpy()
    
print(f"\nüìä Gaussian Point Cloud Statistics:")
print(f"   Points: {len(xyz):,}")
print(f"   X range: [{xyz[:, 0].min():.3f}, {xyz[:, 0].max():.3f}]")
print(f"   Y range: [{xyz[:, 1].min():.3f}, {xyz[:, 1].max():.3f}]")
print(f"   Z range: [{xyz[:, 2].min():.3f}, {xyz[:, 2].max():.3f}]")

# Calculate bounding box and center
bbox_min = xyz.min(axis=0)
bbox_max = xyz.max(axis=0)
center = (bbox_min + bbox_max) / 2
extent = (bbox_max - bbox_min).max()  # Largest dimension

print(f"   Center: ({center[0]:.3f}, {center[1]:.3f}, {center[2]:.3f})")
print(f"   Extent: {extent:.3f}")

# ‚úÖ FIX: Auto-calculate camera radius based on actual model size
# Camera should be ~2-3x the model extent away
AUTO_RADIUS = max(extent * 2.5, 0.5)  # At least 0.5 to avoid being inside model
print(f"   Auto camera radius: {AUTO_RADIUS:.3f}")

# ‚úÖ DEFINE HELPER FUNCTIONS FIRST (before using them)
def get_intrinsics_video(fov_deg=49.1, image_size=512):
    """Get camera intrinsics matrix."""
    fov_rad = math.radians(fov_deg)
    focal = image_size / (2 * math.tan(fov_rad / 2))
    K = np.array([
        [focal, 0, image_size / 2],
        [0, focal, image_size / 2],
        [0, 0, 1]
    ], dtype=np.float32)
    return K

def create_camera_pose_centered(elevation_deg, azimuth_deg, radius, center):
    """Create world-to-camera matrix looking at a specific center point."""
    elev = math.radians(elevation_deg)
    azim = math.radians(azimuth_deg)
    
    # Camera position in spherical coordinates around the center (Y-up convention)
    x = center[0] + radius * math.cos(elev) * math.sin(azim)
    y = center[1] + radius * math.sin(elev)
    z = center[2] + radius * math.cos(elev) * math.cos(azim)
    
    cam_pos = np.array([x, y, z])
    look_at = np.array(center)
    up = np.array([0, 1, 0])  # Y-up
    
    # Construct camera basis
    forward = look_at - cam_pos
    forward_norm = np.linalg.norm(forward)
    if forward_norm < 1e-6:
        forward = np.array([0, 0, -1])
    else:
        forward = forward / forward_norm
    
    right = np.cross(forward, up)
    right_norm = np.linalg.norm(right)
    if right_norm < 1e-6:
        # Camera looking straight up/down - use different up vector
        up = np.array([0, 0, 1])
        right = np.cross(forward, up)
        right_norm = np.linalg.norm(right)
    right = right / (right_norm + 1e-8)
    up_new = np.cross(right, forward)
    
    # World-to-camera transformation
    w2c = np.eye(4, dtype=np.float32)
    w2c[0, :3] = right
    w2c[1, :3] = up_new
    w2c[2, :3] = -forward  # Camera looks along -Z
    w2c[:3, 3] = -w2c[:3, :3] @ cam_pos
    
    return w2c

VIDEO_SIZE = 512
video_intrinsics = get_intrinsics_video(fov_deg=49.1, image_size=VIDEO_SIZE)

# Test render a single frame first to verify it works
print("\nüîç Testing single frame render...")
test_w2c = create_camera_pose_centered(20.0, 0.0, AUTO_RADIUS, center)
with torch.no_grad():
    test_rgb, test_alpha = render_gaussians(model, test_w2c, video_intrinsics, VIDEO_SIZE)
    test_rgb_np = test_rgb.cpu().numpy()
    
print(f"   Test frame - RGB range: [{test_rgb_np.min():.3f}, {test_rgb_np.max():.3f}]")
print(f"   Test frame - mean brightness: {test_rgb_np.mean():.3f}")

# ‚ö†Ô∏è ADAPTIVE RADIUS ADJUSTMENT
attempt_radius = AUTO_RADIUS
max_attempts = 3

for attempt in range(max_attempts):
    if test_rgb_np.max() < 0.01:
        print(f"   ‚ö†Ô∏è Frame is nearly black (attempt {attempt+1}/{max_attempts})! Trying larger radius...")
        attempt_radius = extent * (3.0 + attempt * 2)
        print(f"   New radius: {attempt_radius:.3f}")
        
        test_w2c = create_camera_pose_centered(20.0, 0.0, attempt_radius, center)
        with torch.no_grad():
            test_rgb, test_alpha = render_gaussians(model, test_w2c, video_intrinsics, VIDEO_SIZE)
            test_rgb_np = test_rgb.cpu().numpy()
        print(f"   Retested - RGB range: [{test_rgb_np.min():.3f}, {test_rgb_np.max():.3f}]")
    else:
        break

# If still black after all attempts, show diagnostic data
if test_rgb_np.max() < 0.01:
    print("\nüö® CRITICAL: Still rendering all black after radius adjustment!")
    print("\nüìã Rendering Diagnostics:")
    
    with torch.no_grad():
        params = model()
        print(f"   XYZ range: [{params['xyz'].min():.3f}, {params['xyz'].max():.3f}]")
        print(f"   Colors range: [{params['colors'].min():.3f}, {params['colors'].max():.3f}]")
        print(f"   Opacity range: [{params['opacity'].min():.6f}, {params['opacity'].max():.6f}]")
        print(f"   Scales range: [{params['scales'].min():.3f}, {params['scales'].max():.3f}]")
        
        # Check if any Gaussians have non-zero parameters
        non_black_count = (params['colors'].abs().max(dim=1).values > 0.1).sum().item()
        print(f"   Non-black Gaussians: {non_black_count} / {len(params['xyz'])}")
    
    print("\nüí° Possible causes:")
    print("   - Optimization failed (opacities went to 0)")
    print("   - Gaussian scales became too small (exp of very negative values)")
    print("   - Color values all converged to 0.5 (neutral gray)")
    print("   - Model was reset to initial state with invalid parameters")
    print("\n   Consider: Re-run gsplat optimization with more iterations")
else:
    print("‚úÖ Test frame looks good!")

AUTO_RADIUS = attempt_radius

print("\nüé¨ Rendering 360¬∞ turntable video...")
video_frames = []

with torch.no_grad():
    for azim in tqdm(np.linspace(0, 360, 120, endpoint=False)):
        w2c = create_camera_pose_centered(20.0, azim, AUTO_RADIUS, center)
        rgb, _ = render_gaussians(model, w2c, video_intrinsics, VIDEO_SIZE)
        frame = (rgb.cpu().numpy().clip(0, 1) * 255).astype(np.uint8)
        video_frames.append(frame)

# Verify frames aren't all black
frame_brightness_samples = [np.mean(f) for f in video_frames[::10]]
sample_brightness = np.mean(frame_brightness_samples)
print(f"   Average frame brightness: {sample_brightness:.1f}/255")
print(f"   Min brightness: {min(frame_brightness_samples):.1f}/255")
print(f"   Max brightness: {max(frame_brightness_samples):.1f}/255")

if sample_brightness < 5:
    print("   üö® WARNING: Frames appear very dark!")
    print("   This might indicate a camera/model mismatch or failed optimization.")
elif sample_brightness < 50:
    print("   ‚ö†Ô∏è Frames are quite dim - may want to adjust camera settings")
else:
    print("   ‚úÖ Frames appear properly lit!")

video_path = DIRS['output'] / "glimpse3d_360.mp4"
imageio.mimsave(str(video_path), video_frames, fps=30)
print(f"‚úÖ Video saved: {video_path}")

# Also save a debug frame
debug_frame_path = DIRS['output'] / "debug_frame_0.png"
Image.fromarray(video_frames[0]).save(debug_frame_path)
print(f"‚úÖ Debug frame saved: {debug_frame_path}")

# Save diagnostic info
debug_info_path = DIRS['output'] / "render_diagnostics.txt"
with open(debug_info_path, 'w') as f:
    f.write("GLIMPSE3D Rendering Diagnostics\n")
    f.write("="*60 + "\n\n")
    f.write(f"Gaussian Points: {len(xyz):,}\n")
    f.write(f"Center: ({center[0]:.3f}, {center[1]:.3f}, {center[2]:.3f})\n")
    f.write(f"Extent: {extent:.3f}\n")
    f.write(f"Camera Radius: {AUTO_RADIUS:.3f}\n")
    f.write(f"Frame Size: {VIDEO_SIZE}x{VIDEO_SIZE}\n")
    f.write(f"Average Brightness: {sample_brightness:.1f}/255\n")
    f.write(f"\nIf video is black: Check that gsplat optimization completed successfully\n")
print(f"‚úÖ Diagnostics saved: {debug_info_path}")

In [None]:
# Copy final files
import shutil

# Copy optimized PLY
final_ply = DIRS['output'] / "final_gaussian.ply"
shutil.copy(OPTIMIZED_PLY_PATH, final_ply)

# Copy mesh
shutil.copy(DIRS['triposr'] / "mesh.glb", DIRS['output'] / "initial_mesh.glb")
shutil.copy(DIRS['triposr'] / "mesh.obj", DIRS['output'] / "initial_mesh.obj")

# Copy best views
for i in [0, 4, 8, 12]:
    shutil.copy(
        DIRS['syncdreamer'] / f"view_{i:02d}_e{int(ELEVATIONS[i])}_a{int(AZIMUTHS[i])}.png",
        DIRS['output'] / f"view_{i:02d}.png"
    )

print("\nüìÅ Final output files:")
for f in sorted(DIRS['output'].iterdir()):
    size_mb = f.stat().st_size / 1024 / 1024
    print(f"  {f.name} ({size_mb:.1f} MB)")

In [None]:
# Display video
from IPython.display import HTML
from base64 import b64encode

mp4 = open(video_path, 'rb').read()
data_url = f"data:video/mp4;base64,{b64encode(mp4).decode()}"
HTML(f'''
<h3>üèÜ Glimpse3D Result</h3>
<video width="600" controls autoplay loop>
    <source src="{data_url}" type="video/mp4">
</video>
''')

# üì• Download All Results

In [None]:
from google.colab import files

# Create final ZIP
output_zip = str(WORK_DIR / "glimpse3d_complete_output")
shutil.make_archive(output_zip, 'zip', DIRS['output'])

print("üì• Downloading Glimpse3D results...")
files.download(f"{output_zip}.zip")

print("\n" + "="*60)
print("‚úÖ GLIMPSE3D PIPELINE COMPLETE!")
print("="*60)
print(f"\nDownloaded: glimpse3d_complete_output.zip")
print("\nContents:")
print("  - final_gaussian.ply   : Optimized Gaussian Splats")
print("  - initial_mesh.glb/obj : TripoSR mesh")
print("  - glimpse3d_360.mp4    : 360¬∞ turntable video")
print("  - view_*.png           : Multi-view images")

---

## üéâ Pipeline Complete!

You now have:
1. **final_gaussian.ply** - View in any Gaussian Splat viewer
2. **initial_mesh.glb** - View in 3D viewers like Blender, online GLB viewers
3. **glimpse3d_360.mp4** - Share as video

### Recommended Viewers
- **Gaussian Splats**: [SuperSplat](https://playcanvas.com/supersplat/editor), [Luma AI Viewer](https://lumalabs.ai/)
- **GLB Mesh**: [glTF Viewer](https://gltf-viewer.donmccurdy.com/), Blender

### Tips for Better Results
1. Use high-quality input images with clean backgrounds
2. Objects should be centered and fill ~80% of the frame
3. Avoid reflective or transparent surfaces
4. Run more gsplat iterations (2000+) for higher quality