# GenAI v3: Full Scene Background Manipulation

**Zero training required!**

## Key Features
- **ENTIRE SCENES manipulated** (not just single frames) via PySceneDetect
- **DRAMATIC background changes** (not subtle)
- **Product AUTO-DETECTED** using SAM + DINO episodic memory
- **Smooth video output** with keyframe interpolation
- **Outputs BOTH** manipulated video AND frame

## How It Works
- `increase`: Background becomes **EXTREMELY plain/gray** → all attention on product
- `decrease`: Background becomes **EXTREMELY vibrant/psychedelic** → attention diverted

## Models Used
- **SDXL Inpainting** - State-of-the-art background editing
- **SAM ViT-H** - Segment Anything for product detection
- **DINOv2** - Feature tracking for episodic memory
- **PySceneDetect** - Real scene boundary detection

In [None]:
from GenAI_v3 import SceneManipulator

In [None]:
# Initialize (loads SDXL + SAM + DINO, ~12GB total)
manipulator = SceneManipulator(
    valid_scenes_file="data/valid_scenes.csv",  # Optional (for scene timing)
    video_dir="data/data_tiktok",
    output_dir="outputs/genai_v3",
    device="cuda",
    auto_detect_product=True,  # Auto-detect main product (default)
)

## Increase Attention on Product

Makes background less distracting (muted, simple) → viewer focuses on product

In [None]:
# INCREASE attention on product in scene 6
result = manipulator.manipulate(
    video_id="YOUR_VIDEO_ID",  # ← Change this
    scene_index=6,              # ← Change this
    action="increase",
)

# Both video and frame are output
print(f"Video: {result.video_path}")
print(f"Frame: {result.frame_path}")
print(f"Frames manipulated: {result.frames_manipulated}")

## Decrease Attention on Product

Makes background more interesting (vibrant, detailed) → viewer distracted from product

In [None]:
# DECREASE attention on product in scene 3
result = manipulator.manipulate(
    video_id="YOUR_VIDEO_ID",  # ← Change this
    scene_index=3,              # ← Change this
    action="decrease",
)

# Both video and frame are output
print(f"Video: {result.video_path}")
print(f"Frame: {result.frame_path}")
print(f"Frames manipulated: {result.frames_manipulated}")

## Adjust Parameters

- `strength=0.95`: Default - DRAMATIC change (recommended)
- `strength=0.7`: Moderate change
- `keyframe_interval=10`: Process every 10th frame (interpolate between)

In [None]:
# Custom parameters
result = manipulator.manipulate(
    video_id="YOUR_VIDEO_ID",
    scene_index=6,
    action="increase",
    strength=0.95,           # Higher = more dramatic
    num_inference_steps=40,  # Higher = better quality
    keyframe_interval=5,     # Process every 5th frame for smoother result
)

## Batch Processing

In [None]:
import pandas as pd

# Load valid scenes
scenes_df = pd.read_csv("data/valid_scenes.csv")

# Process multiple videos
results = []
for video_id in scenes_df['video_id'].unique()[:3]:  # First 3 videos
    try:
        result = manipulator.manipulate(
            video_id=str(video_id),
            scene_index=1,  # First scene
            action="increase",
        )
        results.append(result)
        print(f"✓ {video_id}")
        print(f"  Video: {result.video_path}")
        print(f"  Frame: {result.frame_path}")
    except Exception as e:
        print(f"✗ {video_id}: {e}")

## Summary

### What It Does
1. Load video with all frames
2. **Detect scene boundaries** using PySceneDetect (real scene cuts)
3. **Auto-detect main product** using SAM + DINO (episodic memory)
4. **Manipulate keyframes** across entire scene with SDXL
5. **Interpolate** between keyframes for smooth video
6. **Composite**: Keep product from original, use manipulated background
7. **Export** both video and sample frame

### Key Improvements
- **Full scene manipulation** - Not just a static image inserted
- **PySceneDetect** - Uses real scene boundaries, not estimates
- **DRAMATIC changes** - Solid gray or psychedelic, not subtle
- **Smooth video** - Keyframe interpolation, no jarring transitions
- **Dual output** - Both video path AND frame path returned

### Performance
- ~2-5 min per scene (depends on scene length)
- ~12-15GB GPU memory peak
- No training required!