Text-to-video generation using CogVideoX/ModelScope diffusion — configurable frame count, FPS, resolution, optional image conditioning, and MP4 output via Gradio/Streamlit UI.
deep-learning video-generation 3d-unet diffusion-models text-to-video temporal-coherence generative-ai video-synthesis multimodal-ai temporal-diffusion
-
Updated
Mar 15, 2026 - Python