# üéØ SyncDreamer Inference on Google Colab (T4 GPU)

**Generate 16 Multi-View Consistent Images from a Single Image**

This notebook runs SyncDreamer inference on a T4 GPU (15GB VRAM) with optimized settings.

## What this notebook does:
1. ‚úÖ Clones SyncDreamer repository
2. ‚úÖ Installs all dependencies
3. ‚úÖ Downloads pretrained checkpoints (~6GB total)
4. ‚úÖ Configures memory-efficient settings for T4
5. ‚úÖ Runs inference on a test image
6. ‚úÖ Displays 16 generated views in a grid

**‚ö†Ô∏è Make sure to select GPU runtime: Runtime ‚Üí Change runtime type ‚Üí T4 GPU**

## 1Ô∏è‚É£ Setup Environment and Clone Repository

In [None]:
# Check GPU availability
!nvidia-smi

# Clone SyncDreamer repository
%cd /content
!git clone https://github.com/liuyuan-pal/SyncDreamer.git
%cd /content/SyncDreamer

# Create checkpoint directory
!mkdir -p ckpt

## 2Ô∏è‚É£ Install Dependencies

This installs all required packages. May take 2-3 minutes.

In [None]:
# Colab already has PyTorch pre-installed - just use it!
# Only install if needed (uncomment if you get version issues):
# !pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install core dependencies
!pip install -q omegaconf pytorch-lightning==1.9.0 einops kornia
!pip install -q transformers diffusers accelerate

# Install CLIP
!pip install -q git+https://github.com/openai/CLIP.git

# Install taming-transformers (use rom1504 fork - this is what SyncDreamer requires!)
!pip install -q taming-transformers-rom1504

# Install image processing libraries
!pip install -q rembg[gpu] opencv-python-headless scikit-image imageio

# Verify installations
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

# Verify taming is installed correctly
try:
    from taming.modules.vqvae.quantize import VectorQuantizer2
    print("‚úÖ taming-transformers installed correctly!")
except ImportError as e:
    print(f"‚ùå taming import error: {e}")

## 3Ô∏è‚É£ Download Pretrained Checkpoints

Downloads two files (~6GB total):
- `syncdreamer-pretrain.ckpt` (~5.2GB) - Main model
- `ViT-L-14.pt` (~890MB) - CLIP encoder

In [None]:
%cd /content/SyncDreamer

# Install aria2 for faster downloads
!apt -y install -qq aria2

# Download SyncDreamer checkpoint from HuggingFace (~5.2GB)
print("üì• Downloading SyncDreamer checkpoint (this may take 5-10 minutes)...")
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M \
    https://huggingface.co/camenduru/SyncDreamer/resolve/main/syncdreamer-pretrain.ckpt \
    -d /content/SyncDreamer/ckpt -o syncdreamer-pretrain.ckpt

# Download CLIP ViT-L-14 encoder (~890MB)
print("üì• Downloading CLIP ViT-L-14 encoder...")
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M \
    https://huggingface.co/camenduru/SyncDreamer/resolve/main/ViT-L-14.pt \
    -d /content/SyncDreamer/ckpt -o ViT-L-14.pt

# Verify downloads
import os
ckpt_dir = "/content/SyncDreamer/ckpt"
for f in ["syncdreamer-pretrain.ckpt", "ViT-L-14.pt"]:
    path = os.path.join(ckpt_dir, f)
    if os.path.exists(path):
        size_gb = os.path.getsize(path) / (1024**3)
        print(f"‚úÖ {f}: {size_gb:.2f} GB")
    else:
        print(f"‚ùå {f}: NOT FOUND")

## 4Ô∏è‚É£ Configure GPU Memory Settings for T4

Optimized settings for T4 GPU (15GB VRAM):

In [None]:
import os
import gc
import torch

# Set environment variables for memory optimization
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"

# Enable memory-efficient settings
torch.backends.cudnn.benchmark = True
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True

def print_gpu_memory():
    """Print current GPU memory usage"""
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated() / 1024**3
        reserved = torch.cuda.memory_reserved() / 1024**3
        total = torch.cuda.get_device_properties(0).total_memory / 1024**3
        print(f"GPU Memory: {allocated:.2f}GB allocated, {reserved:.2f}GB reserved, {total:.1f}GB total")

def clear_gpu_memory():
    """Clear GPU memory cache"""
    gc.collect()
    torch.cuda.empty_cache()
    print("üßπ GPU memory cache cleared")
    print_gpu_memory()

# Initial memory check
print_gpu_memory()

## 5Ô∏è‚É£ Load SyncDreamer Model

In [None]:
%cd /content/SyncDreamer

import sys
sys.path.insert(0, '/content/SyncDreamer')

import numpy as np
import torch
from omegaconf import OmegaConf
from PIL import Image

from ldm.models.diffusion.sync_dreamer import SyncMultiviewDiffusion, SyncDDIMSampler
from ldm.util import instantiate_from_config, prepare_inputs

def load_model(cfg_path, ckpt_path, device='cuda'):
    """Load SyncDreamer model"""
    print(f"üìÇ Loading config from {cfg_path}")
    config = OmegaConf.load(cfg_path)
    
    print(f"üìÇ Loading checkpoint from {ckpt_path}")
    model = instantiate_from_config(config.model)
    
    ckpt = torch.load(ckpt_path, map_location='cpu')
    model.load_state_dict(ckpt['state_dict'], strict=True)
    
    model = model.to(device).eval()
    print("‚úÖ Model loaded successfully!")
    print_gpu_memory()
    
    return model

# Load the model
CONFIG_PATH = "/content/SyncDreamer/configs/syncdreamer.yaml"
CHECKPOINT_PATH = "/content/SyncDreamer/ckpt/syncdreamer-pretrain.ckpt"

model = load_model(CONFIG_PATH, CHECKPOINT_PATH)

## 6Ô∏è‚É£ Download and Prepare Test Image

**Recommended test images:**
- Simple 3D objects (toys, furniture, shoes)
- Clean backgrounds or objects that can be easily segmented
- Front-facing view with ~30¬∞ elevation works best

We'll use a sample image from the SyncDreamer test set:

In [None]:
import matplotlib.pyplot as plt
from PIL import Image
import requests
from io import BytesIO

# Option 1: Use built-in test image from SyncDreamer
TEST_IMAGE_PATH = "/content/SyncDreamer/testset/aircraft.png"

# Option 2: Download a sample image (uncomment to use)
# Sample images that work well with SyncDreamer:

# Lysol bottle (commonly used for testing)
# !wget -q https://huggingface.co/spaces/One-2-3-45/One-2-3-45/resolve/main/demo_examples/00_zero123_lysol.png -O /content/test_image.png
# TEST_IMAGE_PATH = "/content/test_image.png"

# Astronaut toy
# !wget -q https://huggingface.co/spaces/One-2-3-45/One-2-3-45/resolve/main/demo_examples/01_astronaut.png -O /content/test_image.png
# TEST_IMAGE_PATH = "/content/test_image.png"

# Option 3: Upload your own image (uncomment to use)
# from google.colab import files
# uploaded = files.upload()
# TEST_IMAGE_PATH = list(uploaded.keys())[0]

# Display the test image
print(f"üì∑ Using test image: {TEST_IMAGE_PATH}")
img = Image.open(TEST_IMAGE_PATH)
plt.figure(figsize=(6, 6))
plt.imshow(img)
plt.title(f"Input Image ({img.size[0]}x{img.size[1]}, {img.mode})")
plt.axis('off')
plt.show()

print(f"Image size: {img.size}")
print(f"Image mode: {img.mode}")

## 7Ô∏è‚É£ Preprocess Input Image

For images without transparent backgrounds, we'll use rembg for background removal:

In [None]:
from rembg import remove
import numpy as np

def preprocess_image(image_path, output_path=None, use_rembg=True):
    """
    Preprocess image for SyncDreamer:
    1. Remove background (if needed)
    2. Convert to RGBA with transparent background
    """
    img = Image.open(image_path)
    
    # Check if image already has alpha channel (transparent background)
    if img.mode == 'RGBA':
        alpha = np.array(img)[:, :, 3]
        has_transparency = np.any(alpha < 255)
        if has_transparency:
            print("‚úÖ Image already has transparent background")
            if output_path:
                img.save(output_path)
            return img if not output_path else output_path
    
    # Remove background using rembg
    if use_rembg:
        print("üîÑ Removing background with rembg...")
        img_rgba = remove(img)
        print("‚úÖ Background removed!")
    else:
        img_rgba = img.convert('RGBA')
    
    if output_path:
        img_rgba.save(output_path)
        return output_path
    return img_rgba

# Preprocess the test image
PROCESSED_IMAGE_PATH = "/content/processed_input.png"
preprocess_image(TEST_IMAGE_PATH, PROCESSED_IMAGE_PATH, use_rembg=True)

# Display processed image
processed_img = Image.open(PROCESSED_IMAGE_PATH)
fig, axes = plt.subplots(1, 2, figsize=(12, 6))

# Original
axes[0].imshow(Image.open(TEST_IMAGE_PATH))
axes[0].set_title("Original")
axes[0].axis('off')

# Processed (with alpha)
# Show on checkered background to visualize transparency
axes[1].imshow(processed_img)
axes[1].set_title("Processed (Background Removed)")
axes[1].axis('off')

plt.tight_layout()
plt.show()

## 8Ô∏è‚É£ Run Multi-view Generation Inference

**T4-optimized settings:**
- `batch_view_num=4` (instead of 8) - processes 4 views at a time to reduce VRAM
- `sample_num=1` - generates 1 set of 16 views
- `sample_steps=50` - standard DDIM steps
- `cfg_scale=2.0` - classifier-free guidance scale

In [None]:
import time
from skimage.io import imsave

# ============================================
# INFERENCE PARAMETERS (Optimized for T4 GPU)
# ============================================
INPUT_IMAGE = PROCESSED_IMAGE_PATH
OUTPUT_DIR = "/content/output"
ELEVATION = 30.0        # Input view elevation (degrees) - adjust if needed
CROP_SIZE = 200         # Foreground crop size (-1 to disable)
CFG_SCALE = 2.0         # Classifier-free guidance scale
BATCH_VIEW_NUM = 4      # ‚ö†Ô∏è KEY SETTING: 4 for T4 (15GB), 8 for A100
SAMPLE_NUM = 1          # Number of sample sets to generate
SAMPLE_STEPS = 50       # DDIM sampling steps
SEED = 42               # Random seed for reproducibility

# Set random seeds
torch.manual_seed(SEED)
np.random.seed(SEED)

# Create output directory
os.makedirs(OUTPUT_DIR, exist_ok=True)

print("=" * 50)
print("üöÄ Running SyncDreamer Inference")
print("=" * 50)
print(f"üì∑ Input: {INPUT_IMAGE}")
print(f"üìê Elevation: {ELEVATION}¬∞")
print(f"üéØ CFG Scale: {CFG_SCALE}")
print(f"üì¶ Batch View Num: {BATCH_VIEW_NUM}")
print(f"üî¢ Sample Steps: {SAMPLE_STEPS}")
print("=" * 50)

# Prepare input data
print("\nüìä Preparing input data...")
data = prepare_inputs(INPUT_IMAGE, ELEVATION, CROP_SIZE)
for k, v in data.items():
    data[k] = v.unsqueeze(0).cuda()
    data[k] = torch.repeat_interleave(data[k], SAMPLE_NUM, dim=0)

print_gpu_memory()

# Create sampler
print("\nüé≤ Creating DDIM sampler...")
sampler = SyncDDIMSampler(model, SAMPLE_STEPS)

# Run inference
print("\n‚è≥ Generating 16 multi-view images...")
start_time = time.time()

with torch.no_grad():
    x_sample = model.sample(sampler, data, CFG_SCALE, BATCH_VIEW_NUM)

elapsed_time = time.time() - start_time
print(f"\n‚úÖ Generation complete in {elapsed_time:.1f} seconds!")
print_gpu_memory()

# Process output
B, N, C, H, W = x_sample.shape
print(f"üìê Output shape: {x_sample.shape} (B={B}, N={N} views, {H}x{W})")

x_sample = (torch.clamp(x_sample, max=1.0, min=-1.0) + 1) * 0.5
x_sample = x_sample.permute(0, 1, 3, 4, 2).cpu().numpy() * 255
x_sample = x_sample.astype(np.uint8)

# Store for visualization
generated_views = x_sample[0]  # First batch item, shape: (16, H, W, 3)
print(f"‚úÖ Generated {generated_views.shape[0]} views")

## 9Ô∏è‚É£ Visualize Generated Views

The 16 views are arranged as:
- **Row 1-2**: Elevation 30¬∞ (views 0-7)
- **Row 3-4**: Elevation -20¬∞ (views 8-15)
- **Columns**: Azimuths 0¬∞, 45¬∞, 90¬∞, 135¬∞, 180¬∞, 225¬∞, 270¬∞, 315¬∞

In [None]:
# Create 4x4 grid visualization
fig, axes = plt.subplots(4, 4, figsize=(16, 16))

elevations = [30, 30, 30, 30, 30, 30, 30, 30, -20, -20, -20, -20, -20, -20, -20, -20]
azimuths = [0, 45, 90, 135, 180, 225, 270, 315, 0, 45, 90, 135, 180, 225, 270, 315]

for i in range(16):
    row = i // 4
    col = i % 4
    axes[row, col].imshow(generated_views[i])
    axes[row, col].set_title(f"View {i}: E={elevations[i]}¬∞ A={azimuths[i]}¬∞", fontsize=10)
    axes[row, col].axis('off')

plt.suptitle("SyncDreamer Generated Multi-View Images", fontsize=16, y=1.02)
plt.tight_layout()
plt.savefig(f"{OUTPUT_DIR}/multiview_grid.png", dpi=150, bbox_inches='tight')
plt.show()

print(f"üìä Grid saved to {OUTPUT_DIR}/multiview_grid.png")

## üîü Save Output Images

In [None]:
import imageio

# Save individual views
print("üíæ Saving individual views...")
saved_paths = []
for i in range(16):
    filename = f"view_{i:02d}_elev{elevations[i]}_azim{azimuths[i]}.png"
    path = os.path.join(OUTPUT_DIR, filename)
    imsave(path, generated_views[i])
    saved_paths.append(path)
    
print(f"‚úÖ Saved {len(saved_paths)} individual views to {OUTPUT_DIR}/")

# Save concatenated strip (original SyncDreamer format)
concat_image = np.concatenate([generated_views[i] for i in range(16)], axis=1)
imsave(f"{OUTPUT_DIR}/concat_strip.png", concat_image)
print(f"‚úÖ Saved concatenated strip to {OUTPUT_DIR}/concat_strip.png")

# Create animated GIF (turntable rotation)
print("üé¨ Creating turntable animation...")

# Use first 8 views (elevation 30¬∞) for turntable
turntable_views = [generated_views[i] for i in range(8)]
# Add reverse for smooth loop
turntable_views_loop = turntable_views + turntable_views[::-1][1:-1]

gif_path = f"{OUTPUT_DIR}/turntable.gif"
imageio.mimsave(gif_path, turntable_views_loop, fps=4, loop=0)
print(f"‚úÖ Saved turntable GIF to {gif_path}")

# Display the GIF
from IPython.display import Image as IPImage, display
display(IPImage(filename=gif_path))

# List all output files
print("\nüìÅ Output files:")
for f in os.listdir(OUTPUT_DIR):
    size_kb = os.path.getsize(os.path.join(OUTPUT_DIR, f)) / 1024
    print(f"  - {f} ({size_kb:.1f} KB)")

## üì• Download Results (Optional)

Run this cell to download all outputs as a ZIP file:

In [None]:
import shutil
from google.colab import files

# Create ZIP archive
zip_path = "/content/syncdreamer_output"
shutil.make_archive(zip_path, 'zip', OUTPUT_DIR)
print(f"üì¶ Created {zip_path}.zip")

# Download
files.download(f"{zip_path}.zip")

## üßπ Cleanup (Free GPU Memory)

In [None]:
# Free GPU memory
del model
del sampler
del x_sample
clear_gpu_memory()

print("‚úÖ GPU memory freed!")

---

## üìù Notes & Tips

### T4 GPU Memory Optimization
- `batch_view_num=4` is optimal for T4 (15GB). Use `8` for A100/V100.
- If you get OOM errors, try `batch_view_num=2`

### Best Input Images
- **Size**: Any size (will be resized to 256x256)
- **Background**: Transparent (RGBA) works best
- **Subject**: Centered, single object
- **Elevation**: ~30¬∞ from front works best

### Elevation Tips
- Front-facing photos: `elevation=30`
- Top-down photos: `elevation=60-80`
- Eye-level photos: `elevation=0-20`

### Recommended Test Images
1. **Aircraft** (built-in): `/content/SyncDreamer/testset/aircraft.png`
2. **Lysol bottle**: `https://huggingface.co/spaces/One-2-3-45/One-2-3-45/resolve/main/demo_examples/00_zero123_lysol.png`
3. **Astronaut**: `https://huggingface.co/spaces/One-2-3-45/One-2-3-45/resolve/main/demo_examples/01_astronaut.png`

### Citation
```bibtex
@article{liu2023syncdreamer,
  title={SyncDreamer: Generating Multiview-consistent Images from a Single-view Image},
  author={Liu, Yuan and Lin, Cheng and Zeng, Zijiao and Long, Xiaoxiao and Liu, Lingjie and Komura, Taku and Wang, Wenping},
  journal={arXiv preprint arXiv:2309.03453},
  year={2023}
}
```