Lists (4)
Sort Name ascending (A-Z)
Starred repositories
A Comprehensive Solution for Identifying and Managing Duplicate Photos in Immich
An alternative to the immich-CLI command that doesn't depend on nodejs installation. It tries its best for importing google photos takeout archives.
GGUF Quantization support for native ComfyUI models
Wan: Open and Advanced Large-Scale Video Generative Models
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Development repository for the Triton language and compiler
https://wavespeed.ai/ [WIP] The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.
Get your Pixiv token easily (for running upbit/pixivpy)
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
An open-source, cross-platform terminal for seamless workflows
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker/Zotero
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS…
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
real time face swap and one-click video deepfake with only a single image
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Developer-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.