# üéØ Glimpse3D - Master Pipeline

## Complete 2D Image ‚Üí 3D Gaussian Splat Pipeline

This notebook runs the **entire Glimpse3D pipeline** end-to-end:

```
üì∑ Input Image
    ‚Üì
üî∑ TripoSR (0.5s) ‚Üí Initial 3D Mesh ‚Üí Gaussian Points
    ‚Üì
üé® SyncDreamer (2min) ‚Üí 16 Consistent Multi-View Images
    ‚Üì  
‚ú® SDXL Lightning + ControlNet ‚Üí Enhanced Views
    ‚Üì
üîÆ gsplat Optimization ‚Üí Refined Gaussians
    ‚Üì
üîÑ MVCRM ‚Üí Multi-View Consistent Refinement
    ‚Üì
üèÜ Final 3D Gaussian Splat Output
```

## Requirements
- Google Colab with **T4 GPU** (free tier) or **A100** (faster)
- ~12GB VRAM peak usage
- ~30 minutes total runtime

---

## üöÄ Quick Start

1. Run all cells in order (Runtime ‚Üí Run all)
2. Upload your image when prompted
3. Wait ~30 minutes for full pipeline
4. Download final results!

# Stage 0: Environment Setup

In [None]:
# Check environment
import sys
import os

IN_COLAB = 'google.colab' in sys.modules
print(f"üñ•Ô∏è Running in Colab: {IN_COLAB}")

# Check GPU
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv

import torch
print(f"\nüì¶ PyTorch: {torch.__version__}")
print(f"üî• CUDA: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    GPU_NAME = torch.cuda.get_device_name(0)
    GPU_VRAM = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"üéÆ GPU: {GPU_NAME}")
    print(f"üíæ VRAM: {GPU_VRAM:.1f} GB")
    
    # Set batch sizes based on GPU
    if GPU_VRAM >= 40:  # A100
        BATCH_VIEW_NUM = 8
        MC_RESOLUTION = 384
        NUM_SAMPLES = 200000
    elif GPU_VRAM >= 15:  # T4
        BATCH_VIEW_NUM = 4
        MC_RESOLUTION = 256
        NUM_SAMPLES = 100000
    else:
        BATCH_VIEW_NUM = 2
        MC_RESOLUTION = 192
        NUM_SAMPLES = 50000
    
    print(f"\n‚öôÔ∏è Settings: batch_view={BATCH_VIEW_NUM}, resolution={MC_RESOLUTION}, samples={NUM_SAMPLES}")
else:
    raise RuntimeError("‚ùå No GPU available! Enable GPU in Runtime ‚Üí Change runtime type")

In [None]:
%%capture install_output
# Install all dependencies
print("üì¶ Installing dependencies (this takes ~5 minutes)...")

# Core packages
!pip install torch torchvision --quiet
!pip install transformers diffusers accelerate huggingface_hub --quiet
!pip install omegaconf einops pytorch-lightning==1.9.0 kornia --quiet

# TripoSR dependencies
!pip install trimesh rembg[gpu] xatlas plyfile --quiet
!pip install git+https://github.com/tatsy/torchmcubes.git --quiet

# gsplat
!pip install gsplat --quiet

# SyncDreamer dependencies
!pip install git+https://github.com/openai/CLIP.git --quiet
!pip install taming-transformers-rom1504 --quiet

# Image processing
!pip install opencv-python-headless scikit-image imageio --quiet

# Depth estimation
!pip install timm --quiet

print("\n‚úÖ All dependencies installed!")

In [None]:
# Create directory structure
from pathlib import Path
import gc

WORK_DIR = Path("/content/glimpse3d_pipeline")
WORK_DIR.mkdir(exist_ok=True)

DIRS = {
    'input': WORK_DIR / 'input',
    'triposr': WORK_DIR / 'stage1_triposr',
    'syncdreamer': WORK_DIR / 'stage2_syncdreamer',
    'enhanced': WORK_DIR / 'stage3_enhanced',
    'gsplat': WORK_DIR / 'stage4_gsplat',
    'mvcrm': WORK_DIR / 'stage5_mvcrm',
    'output': WORK_DIR / 'final_output',
}

for name, path in DIRS.items():
    path.mkdir(exist_ok=True)
    print(f"üìÅ {name}: {path}")

def clear_gpu():
    """Clear GPU memory between stages."""
    gc.collect()
    torch.cuda.empty_cache()
    allocated = torch.cuda.memory_allocated() / 1024**3
    print(f"üßπ GPU memory cleared. Using: {allocated:.2f} GB")

print("\n‚úÖ Directory structure created!")

# Stage 1: Upload Input Image

In [None]:
from google.colab import files
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

print("üì§ Upload your image (JPG/PNG):")
uploaded = files.upload()

# Save uploaded file
INPUT_FILENAME = list(uploaded.keys())[0]
INPUT_PATH = DIRS['input'] / INPUT_FILENAME

with open(INPUT_PATH, 'wb') as f:
    f.write(list(uploaded.values())[0])

# Display
input_image = Image.open(INPUT_PATH)
plt.figure(figsize=(8, 8))
plt.imshow(input_image)
plt.title(f"Input: {INPUT_FILENAME} ({input_image.size[0]}x{input_image.size[1]})")
plt.axis('off')
plt.show()

print(f"\n‚úÖ Saved to: {INPUT_PATH}")

# Stage 2: TripoSR - Initial 3D Reconstruction

**Input:** Single image  
**Output:** 3D mesh + Gaussian point cloud  
**Time:** ~30 seconds

In [None]:
# Clone TripoSR
TRIPOSR_PATH = Path("/content/TripoSR")

if not TRIPOSR_PATH.exists():
    print("üì• Cloning TripoSR...")
    !git clone https://github.com/VAST-AI-Research/TripoSR.git {TRIPOSR_PATH}

sys.path.insert(0, str(TRIPOSR_PATH))
os.chdir(TRIPOSR_PATH)
print(f"‚úÖ TripoSR ready at {TRIPOSR_PATH}")

In [None]:
import time
from tsr.system import TSR
from tsr.utils import remove_background, resize_foreground
import rembg

print("\n" + "="*60)
print("üî∑ STAGE 2: TripoSR 3D Reconstruction")
print("="*60)

device = "cuda:0"

# Load model
print("\nüì• Loading TripoSR model...")
triposr_model = TSR.from_pretrained(
    "stabilityai/TripoSR",
    config_name="config.yaml",
    weight_name="model.ckpt",
)
triposr_model.renderer.set_chunk_size(8192)
triposr_model.to(device)
print("‚úÖ Model loaded!")

# Preprocess image
print("\nüîß Preprocessing image...")
input_img = Image.open(INPUT_PATH)
rembg_session = rembg.new_session()
processed_img = remove_background(input_img, rembg_session)
processed_img = resize_foreground(processed_img, 0.85)

# Convert to RGB
img_np = np.array(processed_img).astype(np.float32) / 255.0
img_np = img_np[:, :, :3] * img_np[:, :, 3:4] + (1 - img_np[:, :, 3:4]) * 0.5
processed_img = Image.fromarray((img_np * 255.0).astype(np.uint8))
processed_img.save(DIRS['triposr'] / "processed_input.png")

# Run inference
print("\nüöÄ Running TripoSR...")
start_time = time.time()

with torch.no_grad():
    scene_codes = triposr_model([processed_img], device=device)
    meshes = triposr_model.extract_mesh(scene_codes, has_vertex_color=True, resolution=MC_RESOLUTION)

mesh = meshes[0]
elapsed = time.time() - start_time

print(f"\n‚úÖ Mesh generated in {elapsed:.2f}s")
print(f"   Vertices: {len(mesh.vertices):,}")
print(f"   Faces: {len(mesh.faces):,}")

# Save mesh
mesh.export(str(DIRS['triposr'] / "mesh.obj"))
mesh.export(str(DIRS['triposr'] / "mesh.glb"))
print(f"\nüìÅ Saved mesh to {DIRS['triposr']}")

In [None]:
# Convert mesh to Gaussian PLY
from plyfile import PlyData, PlyElement

def mesh_to_gaussian_ply(mesh, output_path, num_samples=100000):
    """Convert mesh to Gaussian Splat format."""
    print(f"\nüîÑ Sampling {num_samples:,} points...")
    
    points, face_indices = mesh.sample(num_samples, return_index=True)
    
    if mesh.visual.vertex_colors is not None:
        face_vertices = mesh.faces[face_indices]
        vertex_colors = mesh.visual.vertex_colors[:, :3] / 255.0
        colors = vertex_colors[face_vertices].mean(axis=1)
    else:
        colors = np.ones((num_samples, 3)) * 0.5
    
    num_points = len(points)
    xyz = points.astype(np.float32)
    
    C0 = 0.28209479177387814
    features_dc = ((colors - 0.5) / C0).astype(np.float32)
    features_rest = np.zeros((num_points, 45), dtype=np.float32)
    opacities = np.ones((num_points, 1), dtype=np.float32) * 2.2
    scales = np.ones((num_points, 3), dtype=np.float32) * (-4.6)
    rotations = np.zeros((num_points, 4), dtype=np.float32)
    rotations[:, 0] = 1.0
    
    dtype_full = [
        ('x', 'f4'), ('y', 'f4'), ('z', 'f4'),
        ('f_dc_0', 'f4'), ('f_dc_1', 'f4'), ('f_dc_2', 'f4'),
    ]
    for i in range(45):
        dtype_full.append((f'f_rest_{i}', 'f4'))
    dtype_full.extend([
        ('opacity', 'f4'),
        ('scale_0', 'f4'), ('scale_1', 'f4'), ('scale_2', 'f4'),
        ('rot_0', 'f4'), ('rot_1', 'f4'), ('rot_2', 'f4'), ('rot_3', 'f4'),
    ])
    
    elements = np.zeros(num_points, dtype=dtype_full)
    elements['x'] = xyz[:, 0]
    elements['y'] = xyz[:, 1]
    elements['z'] = xyz[:, 2]
    elements['f_dc_0'] = features_dc[:, 0]
    elements['f_dc_1'] = features_dc[:, 1]
    elements['f_dc_2'] = features_dc[:, 2]
    for i in range(45):
        elements[f'f_rest_{i}'] = features_rest[:, i]
    elements['opacity'] = opacities[:, 0]
    elements['scale_0'] = scales[:, 0]
    elements['scale_1'] = scales[:, 1]
    elements['scale_2'] = scales[:, 2]
    elements['rot_0'] = rotations[:, 0]
    elements['rot_1'] = rotations[:, 1]
    elements['rot_2'] = rotations[:, 2]
    elements['rot_3'] = rotations[:, 3]
    
    el = PlyElement.describe(elements, 'vertex')
    PlyData([el]).write(output_path)
    print(f"‚úÖ Saved: {output_path}")

INITIAL_PLY_PATH = DIRS['triposr'] / "initial_gaussian.ply"
mesh_to_gaussian_ply(mesh, str(INITIAL_PLY_PATH), num_samples=NUM_SAMPLES)

# Cleanup TripoSR
del triposr_model, mesh, scene_codes
clear_gpu()

# Stage 3: SyncDreamer - Multi-View Generation

**Input:** Processed image  
**Output:** 16 consistent multi-view images  
**Time:** ~2-3 minutes

In [None]:
print("\n" + "="*60)
print("üé® STAGE 3: SyncDreamer Multi-View Generation")
print("="*60)

# Clone SyncDreamer
SYNCDREAMER_PATH = Path("/content/SyncDreamer")

if not SYNCDREAMER_PATH.exists():
    print("üì• Cloning SyncDreamer...")
    !git clone https://github.com/liuyuan-pal/SyncDreamer.git {SYNCDREAMER_PATH}

# Download checkpoints
CKPT_DIR = SYNCDREAMER_PATH / "ckpt"
CKPT_DIR.mkdir(exist_ok=True)

!apt -y install -qq aria2

CHECKPOINTS = {
    "syncdreamer-pretrain.ckpt": "https://huggingface.co/camenduru/SyncDreamer/resolve/main/syncdreamer-pretrain.ckpt",
    "ViT-L-14.pt": "https://huggingface.co/camenduru/SyncDreamer/resolve/main/ViT-L-14.pt"
}

for fname, url in CHECKPOINTS.items():
    fpath = CKPT_DIR / fname
    if not fpath.exists():
        print(f"üì• Downloading {fname}...")
        !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M "{url}" -d "{CKPT_DIR}" -o "{fname}"
    else:
        print(f"‚úÖ {fname} exists")

sys.path.insert(0, str(SYNCDREAMER_PATH))
os.chdir(SYNCDREAMER_PATH)

In [None]:
from omegaconf import OmegaConf
from ldm.util import instantiate_from_config

# Load SyncDreamer model
print("\nüì• Loading SyncDreamer model...")

config = OmegaConf.load(SYNCDREAMER_PATH / "configs" / "syncdreamer.yaml")
syncdreamer_model = instantiate_from_config(config.model)

state_dict = torch.load(CKPT_DIR / "syncdreamer-pretrain.ckpt", map_location="cpu")["state_dict"]
syncdreamer_model.load_state_dict(state_dict, strict=True)
syncdreamer_model = syncdreamer_model.cuda().eval()

print("‚úÖ SyncDreamer loaded!")

In [None]:
from ldm.models.diffusion.sync_dreamer import SyncDDIMSampler
import clip

# Prepare CLIP encoder
clip_model, clip_preprocess = clip.load("ViT-L/14", device="cuda")

# Prepare input image for SyncDreamer
from PIL import Image
import torchvision.transforms as T

# Load processed image
processed_path = DIRS['triposr'] / "processed_input.png"
input_img = Image.open(processed_path).convert('RGB')
input_img = input_img.resize((256, 256))

# Convert to tensor
transform = T.Compose([
    T.ToTensor(),
    T.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])
input_tensor = transform(input_img).unsqueeze(0).cuda()

# CLIP embedding
clip_input = clip_preprocess(input_img).unsqueeze(0).cuda()
with torch.no_grad():
    clip_emb = clip_model.encode_image(clip_input)
    clip_emb = clip_emb / clip_emb.norm(dim=-1, keepdim=True)

print("‚úÖ Input prepared for SyncDreamer")

In [None]:
# Run SyncDreamer inference
print("\nüöÄ Running SyncDreamer (this takes ~2-3 minutes)...")
start_time = time.time()

# Settings
ELEVATION = 30.0
SAMPLE_STEPS = 50
CFG_SCALE = 2.0

sampler = SyncDDIMSampler(syncdreamer_model)

with torch.no_grad():
    # Prepare conditioning
    cond = {
        'input_image': input_tensor,
        'clip_emb': clip_emb,
        'elevation': torch.tensor([ELEVATION]).cuda(),
    }
    
    # Sample
    samples = sampler.sample(
        syncdreamer_model,
        cond,
        batch_view_num=BATCH_VIEW_NUM,
        ddim_steps=SAMPLE_STEPS,
        unconditional_guidance_scale=CFG_SCALE,
    )

elapsed = time.time() - start_time
print(f"\n‚úÖ SyncDreamer completed in {elapsed/60:.1f} minutes")

In [None]:
# Save multi-view images
print("\nüíæ Saving multi-view images...")

# Convert samples to images
samples = (samples + 1) / 2  # [-1,1] -> [0,1]
samples = samples.clamp(0, 1)

syncdreamer_views = []
ELEVATIONS = [30.0] * 8 + [-20.0] * 8
AZIMUTHS = [i * 45.0 for i in range(8)] * 2

for i in range(16):
    img_tensor = samples[0, i]  # (C, H, W)
    img_np = (img_tensor.permute(1, 2, 0).cpu().numpy() * 255).astype(np.uint8)
    img_pil = Image.fromarray(img_np)
    
    # Save
    save_path = DIRS['syncdreamer'] / f"view_{i:02d}_e{int(ELEVATIONS[i])}_a{int(AZIMUTHS[i])}.png"
    img_pil.save(save_path)
    syncdreamer_views.append(img_pil)

print(f"‚úÖ Saved {len(syncdreamer_views)} views to {DIRS['syncdreamer']}")

# Display grid
fig, axes = plt.subplots(4, 4, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
    ax.imshow(syncdreamer_views[i])
    ax.set_title(f"E={ELEVATIONS[i]}¬∞ A={AZIMUTHS[i]}¬∞", fontsize=8)
    ax.axis('off')
plt.suptitle("SyncDreamer: 16 Multi-View Images", fontsize=14)
plt.tight_layout()
plt.savefig(DIRS['syncdreamer'] / "grid.png", dpi=150)
plt.show()

# Cleanup SyncDreamer
del syncdreamer_model, sampler, samples, clip_model
clear_gpu()

# Stage 4: SDXL Enhancement (Optional)

**Input:** Multi-view images  
**Output:** Enhanced multi-view images  
**Time:** ~1 minute per image

Skip this stage if you want faster results.

In [None]:
SKIP_ENHANCEMENT = False  # Set to True to skip this stage

if not SKIP_ENHANCEMENT:
    print("\n" + "="*60)
    print("‚ú® STAGE 4: SDXL Lightning Enhancement")
    print("="*60)
    
    from diffusers import StableDiffusionXLImg2ImgPipeline, AutoencoderKL
    
    # Load SDXL Lightning
    print("\nüì• Loading SDXL Lightning...")
    
    vae = AutoencoderKL.from_pretrained(
        "madebyollin/sdxl-vae-fp16-fix",
        torch_dtype=torch.float16
    )
    
    pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
        vae=vae,
        torch_dtype=torch.float16,
        variant="fp16",
    ).to("cuda")
    
    # Load Lightning LoRA
    pipe.load_lora_weights("ByteDance/SDXL-Lightning", weight_name="sdxl_lightning_4step_lora.safetensors")
    pipe.fuse_lora()
    
    print("‚úÖ SDXL Lightning loaded!")
else:
    print("‚è≠Ô∏è Skipping enhancement stage")

In [None]:
if not SKIP_ENHANCEMENT:
    # Enhance select views (not all 16 to save time)
    VIEWS_TO_ENHANCE = [0, 2, 4, 6, 8, 10, 12, 14]  # Every other view
    
    print(f"\nüöÄ Enhancing {len(VIEWS_TO_ENHANCE)} views...")
    
    enhanced_views = syncdreamer_views.copy()  # Start with original
    
    prompt = "highly detailed 3D render, professional lighting, sharp textures, 8k quality"
    negative_prompt = "blurry, low quality, artifacts, noise"
    
    for i, view_idx in enumerate(VIEWS_TO_ENHANCE):
        print(f"  Enhancing view {view_idx} ({i+1}/{len(VIEWS_TO_ENHANCE)})...")
        
        input_img = syncdreamer_views[view_idx].resize((512, 512))
        
        with torch.no_grad():
            result = pipe(
                prompt=prompt,
                negative_prompt=negative_prompt,
                image=input_img,
                strength=0.3,
                num_inference_steps=4,
                guidance_scale=0,
            ).images[0]
        
        enhanced_views[view_idx] = result
        result.save(DIRS['enhanced'] / f"enhanced_{view_idx:02d}.png")
    
    print(f"\n‚úÖ Enhanced views saved to {DIRS['enhanced']}")
    
    # Cleanup
    del pipe, vae
    clear_gpu()
else:
    enhanced_views = syncdreamer_views
    print("Using original SyncDreamer views")

# Stage 5: gsplat Optimization

**Input:** Initial Gaussian PLY + Multi-view images  
**Output:** Optimized Gaussian Splats  
**Time:** ~5 minutes

In [None]:
print("\n" + "="*60)
print("üîÆ STAGE 5: gsplat Optimization")
print("="*60)

import torch.nn as nn
from gsplat import rasterization
import math

device = torch.device("cuda:0")

# Load initial Gaussians
def load_gaussian_ply(path):
    plydata = PlyData.read(path)
    vertex = plydata['vertex']
    
    xyz = np.stack([vertex['x'], vertex['y'], vertex['z']], axis=-1)
    f_dc = np.stack([vertex['f_dc_0'], vertex['f_dc_1'], vertex['f_dc_2']], axis=-1)
    f_rest_names = [f'f_rest_{i}' for i in range(45)]
    f_rest = np.stack([vertex[name] for name in f_rest_names if name in vertex.data.dtype.names], axis=-1)
    opacity = vertex['opacity']
    scales = np.stack([vertex['scale_0'], vertex['scale_1'], vertex['scale_2']], axis=-1)
    rotations = np.stack([vertex['rot_0'], vertex['rot_1'], vertex['rot_2'], vertex['rot_3']], axis=-1)
    
    return {
        'xyz': torch.tensor(xyz, dtype=torch.float32),
        'f_dc': torch.tensor(f_dc, dtype=torch.float32),
        'f_rest': torch.tensor(f_rest, dtype=torch.float32),
        'opacity': torch.tensor(opacity, dtype=torch.float32),
        'scales': torch.tensor(scales, dtype=torch.float32),
        'rotations': torch.tensor(rotations, dtype=torch.float32),
    }

gaussians = load_gaussian_ply(str(INITIAL_PLY_PATH))
print(f"‚úÖ Loaded {len(gaussians['xyz']):,} Gaussians")

In [None]:
class GaussianModel(nn.Module):
    def __init__(self, gaussians):
        super().__init__()
        self.xyz = nn.Parameter(gaussians['xyz'].clone())
        self.f_dc = nn.Parameter(gaussians['f_dc'].clone())
        self.f_rest = nn.Parameter(gaussians['f_rest'].clone())
        self.opacity_raw = nn.Parameter(gaussians['opacity'].clone())
        self.scales_raw = nn.Parameter(gaussians['scales'].clone())
        self.rotations = nn.Parameter(gaussians['rotations'].clone())
        
    @property
    def opacity(self):
        return torch.sigmoid(self.opacity_raw)
    
    @property
    def scales(self):
        return torch.exp(self.scales_raw)
    
    def get_colors(self):
        C0 = 0.28209479177387814
        return 0.5 + C0 * self.f_dc
    
    def forward(self):
        return {
            'xyz': self.xyz,
            'colors': self.get_colors(),
            'opacity': self.opacity,
            'scales': self.scales,
            'rotations': self.rotations / (self.rotations.norm(dim=-1, keepdim=True) + 1e-8),
        }

model = GaussianModel(gaussians).to(device)
print(f"‚úÖ Model: {sum(p.numel() for p in model.parameters()):,} parameters")

In [None]:
# Camera system matching SyncDreamer
def create_camera_pose(elevation_deg, azimuth_deg, radius=2.0):
    elev = math.radians(elevation_deg)
    azim = math.radians(azimuth_deg)
    
    x = radius * math.cos(elev) * math.cos(azim)
    y = radius * math.cos(elev) * math.sin(azim)
    z = radius * math.sin(elev)
    
    cam_pos = np.array([x, y, z])
    look_at = np.array([0, 0, 0])
    up = np.array([0, 0, 1])
    
    forward = look_at - cam_pos
    forward = forward / np.linalg.norm(forward)
    right = np.cross(forward, up)
    right = right / np.linalg.norm(right)
    up_new = np.cross(right, forward)
    
    w2c = np.eye(4)
    w2c[:3, 0] = right
    w2c[:3, 1] = up_new
    w2c[:3, 2] = -forward
    w2c[:3, 3] = -w2c[:3, :3] @ cam_pos
    return w2c

def get_projection_matrix(fov_deg=60, aspect=1.0):
    fov_rad = math.radians(fov_deg)
    f = 1.0 / math.tan(fov_rad / 2)
    proj = np.zeros((4, 4))
    proj[0, 0] = f / aspect
    proj[1, 1] = f
    proj[2, 2] = -1.01
    proj[2, 3] = -0.2
    proj[3, 2] = -1
    return proj

camera_poses = [create_camera_pose(e, a) for e, a in zip(ELEVATIONS, AZIMUTHS)]
projection = get_projection_matrix()
IMAGE_SIZE = 256

In [None]:
def render_gaussians(model, w2c, proj, image_size):
    params = model()
    
    viewmat = torch.tensor(w2c, dtype=torch.float32, device=device)
    K = torch.tensor([
        [proj[0, 0] * image_size / 2, 0, image_size / 2],
        [0, proj[1, 1] * image_size / 2, image_size / 2],
        [0, 0, 1]
    ], dtype=torch.float32, device=device)
    
    render_colors, render_alphas, _ = rasterization(
        means=params['xyz'],
        quats=params['rotations'],
        scales=params['scales'],
        opacities=params['opacity'],
        colors=params['colors'],
        viewmats=viewmat.unsqueeze(0),
        Ks=K.unsqueeze(0),
        width=image_size,
        height=image_size,
        packed=False,
        render_mode="RGB",
    )
    
    return render_colors[0], render_alphas[0]

In [None]:
from tqdm import tqdm
import torch.nn.functional as F

# Prepare target images
target_tensors = []
for img in enhanced_views:
    img_resized = img.resize((IMAGE_SIZE, IMAGE_SIZE))
    img_tensor = torch.tensor(np.array(img_resized) / 255.0, dtype=torch.float32, device=device)
    target_tensors.append(img_tensor)

# Optimizer
optimizer = torch.optim.Adam([
    {'params': model.xyz, 'lr': 1e-4},
    {'params': model.f_dc, 'lr': 1e-3},
    {'params': model.f_rest, 'lr': 1e-3 / 20},
    {'params': model.opacity_raw, 'lr': 0.05},
    {'params': model.scales_raw, 'lr': 5e-3},
    {'params': model.rotations, 'lr': 1e-3},
])

scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.995)

# Training
NUM_ITERATIONS = 1000
losses = []

print("\nüöÄ Starting optimization...")
pbar = tqdm(range(NUM_ITERATIONS))

for iteration in pbar:
    optimizer.zero_grad()
    
    # Sample random view
    view_idx = np.random.randint(0, 16)
    w2c = camera_poses[view_idx]
    target = target_tensors[view_idx]
    
    rendered, alpha = render_gaussians(model, w2c, projection, IMAGE_SIZE)
    
    loss = F.mse_loss(rendered, target)
    loss.backward()
    
    optimizer.step()
    scheduler.step()
    
    losses.append(loss.item())
    
    if iteration % 100 == 0:
        pbar.set_postfix({'loss': f'{loss.item():.4f}'})

print(f"\n‚úÖ Optimization complete! Final loss: {losses[-1]:.4f}")

In [None]:
# Save optimized model
def save_gaussian_ply(model, output_path):
    with torch.no_grad():
        params = model()
        xyz = params['xyz'].cpu().numpy()
        colors = model.f_dc.cpu().numpy()
        f_rest = model.f_rest.cpu().numpy()
        opacity = model.opacity_raw.cpu().numpy()
        scales = model.scales_raw.cpu().numpy()
        rotations = params['rotations'].cpu().numpy()
        
    num_points = len(xyz)
    dtype_full = [('x', 'f4'), ('y', 'f4'), ('z', 'f4'),
                  ('f_dc_0', 'f4'), ('f_dc_1', 'f4'), ('f_dc_2', 'f4')]
    for i in range(f_rest.shape[1]):
        dtype_full.append((f'f_rest_{i}', 'f4'))
    dtype_full.extend([('opacity', 'f4'),
                       ('scale_0', 'f4'), ('scale_1', 'f4'), ('scale_2', 'f4'),
                       ('rot_0', 'f4'), ('rot_1', 'f4'), ('rot_2', 'f4'), ('rot_3', 'f4')])
    
    elements = np.zeros(num_points, dtype=dtype_full)
    elements['x'] = xyz[:, 0]
    elements['y'] = xyz[:, 1]
    elements['z'] = xyz[:, 2]
    elements['f_dc_0'] = colors[:, 0]
    elements['f_dc_1'] = colors[:, 1]
    elements['f_dc_2'] = colors[:, 2]
    for i in range(f_rest.shape[1]):
        elements[f'f_rest_{i}'] = f_rest[:, i]
    elements['opacity'] = opacity
    elements['scale_0'] = scales[:, 0]
    elements['scale_1'] = scales[:, 1]
    elements['scale_2'] = scales[:, 2]
    elements['rot_0'] = rotations[:, 0]
    elements['rot_1'] = rotations[:, 1]
    elements['rot_2'] = rotations[:, 2]
    elements['rot_3'] = rotations[:, 3]
    
    el = PlyElement.describe(elements, 'vertex')
    PlyData([el]).write(output_path)

OPTIMIZED_PLY_PATH = DIRS['gsplat'] / "optimized_gaussian.ply"
save_gaussian_ply(model, str(OPTIMIZED_PLY_PATH))
print(f"‚úÖ Saved optimized Gaussians: {OPTIMIZED_PLY_PATH}")

# Stage 6: Generate Final Outputs

In [None]:
print("\n" + "="*60)
print("üèÜ FINAL OUTPUT GENERATION")
print("="*60)

import imageio

# Generate 360¬∞ video
print("\nüé¨ Rendering 360¬∞ turntable video...")
video_frames = []

with torch.no_grad():
    for azim in tqdm(np.linspace(0, 360, 120)):
        w2c = create_camera_pose(30.0, azim, radius=2.0)
        rgb, _ = render_gaussians(model, w2c, projection, 512)
        frame = (rgb.cpu().numpy().clip(0, 1) * 255).astype(np.uint8)
        video_frames.append(frame)

video_path = DIRS['output'] / "glimpse3d_360.mp4"
imageio.mimsave(str(video_path), video_frames, fps=30)
print(f"‚úÖ Video saved: {video_path}")

In [None]:
# Copy final files
import shutil

# Copy optimized PLY
final_ply = DIRS['output'] / "final_gaussian.ply"
shutil.copy(OPTIMIZED_PLY_PATH, final_ply)

# Copy mesh
shutil.copy(DIRS['triposr'] / "mesh.glb", DIRS['output'] / "initial_mesh.glb")
shutil.copy(DIRS['triposr'] / "mesh.obj", DIRS['output'] / "initial_mesh.obj")

# Copy best views
for i in [0, 4, 8, 12]:
    shutil.copy(
        DIRS['syncdreamer'] / f"view_{i:02d}_e{int(ELEVATIONS[i])}_a{int(AZIMUTHS[i])}.png",
        DIRS['output'] / f"view_{i:02d}.png"
    )

print("\nüìÅ Final output files:")
for f in sorted(DIRS['output'].iterdir()):
    size_mb = f.stat().st_size / 1024 / 1024
    print(f"  {f.name} ({size_mb:.1f} MB)")

In [None]:
# Display video
from IPython.display import HTML
from base64 import b64encode

mp4 = open(video_path, 'rb').read()
data_url = f"data:video/mp4;base64,{b64encode(mp4).decode()}"
HTML(f'''
<h3>üèÜ Glimpse3D Result</h3>
<video width="600" controls autoplay loop>
    <source src="{data_url}" type="video/mp4">
</video>
''')

# üì• Download All Results

In [None]:
from google.colab import files

# Create final ZIP
output_zip = str(WORK_DIR / "glimpse3d_complete_output")
shutil.make_archive(output_zip, 'zip', DIRS['output'])

print("üì• Downloading Glimpse3D results...")
files.download(f"{output_zip}.zip")

print("\n" + "="*60)
print("‚úÖ GLIMPSE3D PIPELINE COMPLETE!")
print("="*60)
print(f"\nDownloaded: glimpse3d_complete_output.zip")
print("\nContents:")
print("  - final_gaussian.ply   : Optimized Gaussian Splats")
print("  - initial_mesh.glb/obj : TripoSR mesh")
print("  - glimpse3d_360.mp4    : 360¬∞ turntable video")
print("  - view_*.png           : Multi-view images")

---

## üéâ Pipeline Complete!

You now have:
1. **final_gaussian.ply** - View in any Gaussian Splat viewer
2. **initial_mesh.glb** - View in 3D viewers like Blender, online GLB viewers
3. **glimpse3d_360.mp4** - Share as video

### Recommended Viewers
- **Gaussian Splats**: [SuperSplat](https://playcanvas.com/supersplat/editor), [Luma AI Viewer](https://lumalabs.ai/)
- **GLB Mesh**: [glTF Viewer](https://gltf-viewer.donmccurdy.com/), Blender

### Tips for Better Results
1. Use high-quality input images with clean backgrounds
2. Objects should be centered and fill ~80% of the frame
3. Avoid reflective or transparent surfaces
4. Run more gsplat iterations (2000+) for higher quality