Stars
An open-source & self-hostable Heroku / Netlify / Vercel alternative.
Industry leading face manipulation platform
State-of-the-art 2D and 3D Face Analysis Project
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
The official code of our CVPR2025 paper: "Segment Any-Quality Images with Generative Latent Space Enhancement".
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
Lightweight and Efficient, 🎧Ultra High-Quality Voice Cloning, Chinese and English.
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation
FILM: Frame Interpolation for Large Motion, In ECCV 2022.
Depth-Aware Video Frame Interpolation (CVPR 2019)
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfac…
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
đź“„ A curated list of awesome .cursorrules files
Explore RPF files just like OpenIV or CodeWalker just with the big difference that everything is implemented right into your Windows File Explorer!
Dk0071942 / LivePortrait
Forked from KwaiVGI/LivePortraitBring portraits to life!
[AAAI 2025] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats including EPUB books and PDF documents.
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …