# Ditto TalkingHead - Digital Twin Demo

This notebook demonstrates how to create and use a digital twin similar to Heygen.

## Overview:
1. Train a digital twin from a video of yourself speaking
2. Use your digital twin to generate personalized talking head animations

## 1. Import dependencies

In [None]:

import time
from stream_pipeline_offline import StreamSDK
from personalization import DigitalTwinTrainer
from inference import run

## 2. Configuration

Set up the paths to your models and data.

In [None]:
# Path to model data and configuration
DATA_ROOT = "./checkpoints/ditto_trt_Ampere_Plus"
CFG_PKL = "./checkpoints/ditto_cfg/v0.4_hubert_cfg_trt.pkl"

# Path to your source video for training the digital twin
# This should be a video of you speaking, preferably with a variety of expressions
SOURCE_VIDEO = "path/to/your/training_video.mp4"  # Change this to your video path

# Directory to save the trained digital twin model
DIGITAL_TWIN_DIR = "./my_digital_twin"

# For inference after training
AUDIO_PATH = "path/to/audio.wav"  # Audio you want your digital twin to speak
REFERENCE_IMAGE = "path/to/reference.jpg"  # A reference image of you (can be a frame from the video)
OUTPUT_PATH = "digital_twin_result.mp4"  # Where to save the output video

## 3. Initialize SDK

In [None]:
# Initialize the StreamSDK
sdk = StreamSDK(CFG_PKL, DATA_ROOT)
print("SDK initialized successfully")

## 4. Train your digital twin

This step analyzes your video to learn your unique facial expressions and speaking style. It might take a while depending on the length of your video and the number of training epochs.

In [None]:
# Create a digital twin trainer
trainer = DigitalTwinTrainer(
    sdk=sdk,
    source_video_path=SOURCE_VIDEO,
    epochs=20,  # Number of training epochs - higher is better but takes longer
    learning_rate=1e-5,  # Learning rate for fine-tuning
    output_dir=DIGITAL_TWIN_DIR,
    device="cuda"  # Use "cpu" if you don't have a GPU
)

# Start training
print(f"Training digital twin from {SOURCE_VIDEO}...")
start_time = time.time()
model_path = trainer.train()
end_time = time.time()
print(f"Training completed in {(end_time - start_time) / 60:.2f} minutes")
print(f"Model saved to {model_path}")

## 5. Generate a video with your digital twin

Now you can use your trained digital twin to generate a personalized talking head animation.

In [None]:
# Initialize a new SDK instance for inference
inference_sdk = StreamSDK(CFG_PKL, DATA_ROOT)

# Configure settings for digital twin inference
setup_kwargs = {
    # Digital twin settings
    "digital_twin_mode": True,  # Enable digital twin mode
    "digital_twin_model_dir": DIGITAL_TWIN_DIR,  # Path to your trained model
    
    # Expression settings
    "emotion_intensity": 1.3,  # Controls emotional expressiveness (1.0-1.5)
    "sampling_timesteps": 50,  # More = better quality but slower (50-80)
    "smo_k_d": 1,  # Motion smoothing (1-3, lower = more dynamic)
    
    # Background motion
    "bg_motion_enabled": True,  # Enable subtle background motion
    "bg_motion_intensity": 0.005,  # Background motion intensity (0.001-0.02)
}

more_kwargs = {"setup_kwargs": setup_kwargs}

# Generate video
print(f"Generating video with digital twin...")
run(inference_sdk, AUDIO_PATH, REFERENCE_IMAGE, OUTPUT_PATH, more_kwargs)
print(f"Digital twin video saved to {OUTPUT_PATH}")

## 6. Compare with regular mode

Let's generate another video without the digital twin to see the difference.

In [None]:
# Initialize another SDK instance
regular_sdk = StreamSDK(CFG_PKL, DATA_ROOT)

# Configure settings without digital twin
regular_kwargs = {
    "digital_twin_mode": False,  # Disable digital twin mode
    
    # Same expression settings as before
    "emotion_intensity": 1.3,
    "sampling_timesteps": 50,
    "smo_k_d": 1,
    
    # Same background motion
    "bg_motion_enabled": True,
    "bg_motion_intensity": 0.005,
}

regular_more_kwargs = {"setup_kwargs": regular_kwargs}

# Generate regular video
regular_output = "regular_result.mp4"
print(f"Generating video without digital twin...")
run(regular_sdk, AUDIO_PATH, REFERENCE_IMAGE, regular_output, regular_more_kwargs)
print(f"Regular video saved to {regular_output}")

## How it works

The digital twin feature works in several stages:

1. **Facial Analysis**: The system extracts facial features and expressions from your video
2. **Style Analysis**: It analyzes your unique speaking style, including expression patterns
3. **Model Personalization**: It fine-tunes the motion generation model to match your style
4. **Personalized Animation**: During inference, it applies your personal style to the animation

The result is a talking head animation that better represents your personal speaking style and expressions, similar to how Heygen creates personalized digital twins.