**PREPARE ENVIRONMENT**

- You can set **parameters_in_vram** to 10 or less to reduce the VRAM used, especially if you want to generate a video with a resolution greater than 480 by 832 or 832 by 480. The VRAM used depends on the value of **parameters_in_vram** and the resolutions of the input image & output video.
- Setting **parameters_in_vram** to 15 with a video resolution of 480 by 832 will result in a generation time of approximately 10 minutes. Lower values of **parameters_in_vram** will result in longer generation times. Higher values can reduce the video generation time, but increase the risk of getting **Out of Memory Error**s.

In [1]:
# @title
!git clone https://github.com/Isi-dev/DiffSynth-Studio.git
%cd DiffSynth-Studio
!pip install -e .
!pip install "huggingface_hub[cli]"
!apt-get install -y aria2
import os
from huggingface_hub import list_repo_files

repo_id = "Isi99999/Wan2.1-I2V-14B-480P"
all_files = list_repo_files(repo_id)
base_url = f"https://huggingface.co/{repo_id}/resolve/main/"

with open("file_list.txt", "w") as f:
    for file_path in all_files:
        full_url = f"{base_url}{file_path}"
        save_path = f"models/Wan-AI/Wan2.1-I2V-14B-480P/{file_path}"
        os.makedirs(os.path.dirname(save_path), exist_ok=True)
        f.write(f"{full_url}\n out={save_path}\n")
!aria2c -x 16 -s 16 -i file_list.txt --continue=true --auto-file-renaming=false

print("✅ All models downloaded successfully!")

import torch
from diffsynth import ModelManager, WanVideoPipeline, VideoData, save_video

model_manager = ModelManager(device="cpu")
model_manager.load_models(
    ["models/Wan-AI/Wan2.1-I2V-14B-480P/models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth"],
    torch_dtype=torch.float16, # Image Encoder is loaded with float16
)
model_manager.load_models(
    [
        "models/Wan-AI/Wan2.1-I2V-14B-480P/diffusion_pytorch_model.safetensors",
        "models/Wan-AI/Wan2.1-I2V-14B-480P/models_t5_umt5-xxl-enc-bf16.safetensors",
        "models/Wan-AI/Wan2.1-I2V-14B-480P/Wan2.1_VAE.safetensors",
    ],
    torch_dtype=torch.torch.bfloat16, # You can set `torch_dtype=torch.float8_e4m3fn` or `torch_dtype=torch.bfloat16` to disable FP8 quantization.
)

pipe = WanVideoPipeline.from_model_manager(model_manager, torch_dtype=torch.bfloat16, device="cuda")
parameters_in_vram = 6 # @param {"type":"number"}
pipe.enable_vram_management(num_persistent_param_in_dit=parameters_in_vram*10**9) # You can set `num_persistent_param_in_dit` to a small number to reduce VRAM required.
print("✅ All models loaded successfully!")

Cloning into 'DiffSynth-Studio'...
remote: Enumerating objects: 2859, done.[K
remote: Counting objects: 100% (1317/1317), done.[K
remote: Compressing objects: 100% (426/426), done.[K
remote: Total 2859 (delta 1076), reused 891 (delta 891), pack-reused 1542 (from 1)[K
Receiving objects: 100% (2859/2859), 11.23 MiB | 9.57 MiB/s, done.
Resolving deltas: 100% (1840/1840), done.
/content/DiffSynth-Studio
Obtaining file:///content/DiffSynth-Studio
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting transformers==4.46.2 (from diffsynth==1.1.2)
  Downloading transformers-4.46.2-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting controlnet-aux==0.0.7 (from diffsynth==1.1.2)
  Downloading controlnet_aux-0.0.7.tar.gz (202 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m202.4/202.4 kB[0m [31m25.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparin

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.



04/08 12:18:47 [[1;32mNOTICE[0m] Downloading 15 item(s)

04/08 12:18:47 [[1;32mNOTICE[0m] Download complete: /content/DiffSynth-Studio/models/Wan-AI/Wan2.1-I2V-14B-480P/README.md

04/08 12:18:47 [[1;32mNOTICE[0m] CUID#11 - Redirecting to https://cdn-lfs-us-1.hf.co/repos/0c/33/0c337ffd39bf0fd5ff0f11ab3b8547ee3a08339a52fa73136e0b5184a6a04347/996dbad030df09b0b3c8e764f0fb5a81b98b220ab89524d6a9369e9ed882791f?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27diffusion_pytorch_model.safetensors%3B+filename%3D%22diffusion_pytorch_model.safetensors%22%3B&Expires=1744118327&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0NDExODMyN319LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzBjLzMzLzBjMzM3ZmZkMzliZjBmZDVmZjBmMTFhYjNiODU0N2VlM2EwODMzOWE1MmZhNzMxMzZlMGI1MTg0YTZhMDQzNDcvOTk2ZGJhZDAzMGRmMDliMGIzYzhlNzY0ZjBmYjVhODFiOThiMjIwYWI4OTUyNGQ2YTkzNjllOWVkODgyNzkxZj9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&Signature=BdckhfSg

**RUN TO UPLOAD IMAGE**

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [18]:
# 圖片儲存在 Google Drive
from PIL import Image
image_path = '/content/grace1.png'
image = Image.open(image_path).convert("RGB")

print("✅ Image loaded successfully:", image.size)


✅ Image loaded successfully: (1024, 1536)


**RUN IMAGE TO VIDEO**

In [None]:


prompt = "女友風格，對鏡頭互動" # @param {type:"string"}
sample_steps = 20 # @param {"type":"number"}30
Instruction = "choose from '720*1280', '1280*720', '480*832', '832*480', '1024*1024 for output video's width & height." # @param {"type":"string"}
width = 512 # @param {"type":"number"}
height = 768 # @param {"type":"number"}
num_frames = 81 # @param {"type":"number"}81
seed = 1 # @param {"type":"number"}

# Generate video from text prompt and Image
video = pipe(
    prompt=prompt,
    negative_prompt="色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走",
    input_image=image,
    height = height,
    width = width,
    num_frames=num_frames,
    num_inference_steps=sample_steps,
    seed=seed, tiled=True
)

# # Save the generated video, fps=15, quality=5
save_video(video, "video.mp4", fps=15, quality=5)

from IPython.display import display as displayVid, Video as outVid
import os

# Function to display video
def show_video(video_path):
    if os.path.exists(video_path):
        displayVid(outVid(video_path, embed=True))
    else:
        print(f"Error: {video_path} not found!")

# Show the video
show_video("video.mp4")

 65%|██████▌   | 13/20 [15:33<08:22, 71.74s/it]

**UTILITY FUNCTIONS**

In [17]:
# Download file
from google.colab import files

files.download('./video.mp4')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [1]:
# terminate a Colab session
import os
import time

import torch

# Force PyTorch to use CPU
device = torch.device("cpu")

# Example: Moving a tensor to the CPU (if needed)
tensor = torch.randn(3, 3).to(device)
print(f"Running on device: {device}")



Running on device: cpu


In [7]:
# Use L4 GPU, release VRAM
import torch
torch.cuda.empty_cache()
torch.cuda.ipc_collect()
%env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True


env: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
