tranquangchung

Follow

chung tq tranquangchung

Follow

8 followers · 3 following

Achievements

Achievements

Highlights

Pro

Stars

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

969 59 Updated Apr 18, 2025

jishengpeng / WavChat

A Survey of Spoken Dialogue Models (60 pages)

288 16 Updated Nov 28, 2024

haoheliu / SemantiCodec-inference

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 197 15 Updated Mar 7, 2025

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,696 129 Updated Apr 21, 2025

mct10 / RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 175 11 Updated Jul 12, 2024

ZhangXInFD / SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 550 51 Updated Jun 9, 2024

0nutation / SpeechGPT

SpeechGPT Series: Speech Large Language Models

Python 1,367 91 Updated Jul 22, 2024

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 8,784 905 Updated Apr 9, 2025

ambisinister / lossfreebalance

toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Python 10 2 Updated Sep 1, 2024

facebookresearch / flores

Facebook Low Resource (FLoRes) MT Benchmark

Python 729 127 Updated Nov 20, 2023

microsoft / Magma

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,603 106 Updated Apr 21, 2025

microsoft / PhiCookBook

This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…

Jupyter Notebook 3,185 401 Updated Apr 8, 2025

huggingface / speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 3,989 438 Updated Apr 15, 2025

MikeWangWZHL / EEG-To-Text

code for AAAI2022 paper "Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification"

Python 193 40 Updated Jun 30, 2024

facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,492 1,134 Updated Nov 14, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,056 1,629 Updated Apr 24, 2025

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 44,958 4,989 Updated Apr 22, 2025

FreedomIntelligence / HuatuoGPT-o1

Medical o1, Towards medical complex reasoning with LLMs

Python 1,074 107 Updated Jan 20, 2025

deepseek-ai / DeepSeek-V3

Python 96,012 15,614 Updated Apr 9, 2025

Plachtaa / seed-vc

zero-shot voice conversion & singing voice conversion, with real-time support

Python 2,291 254 Updated Apr 20, 2025

chiphuyen / aie-book

[WIP] Resources for AI engineers. Also contains supporting materials for the book AI Engineering (Chip Huyen, 2025)

Jupyter Notebook 3,919 470 Updated Feb 12, 2025

voidful / Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Python 251 23 Updated Apr 14, 2025

gpt-omni / mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,728 193 Updated Jan 16, 2025

facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Python 6,789 948 Updated Jul 3, 2024

husthuaan / AoANet

Code for paper "Attention on Attention for Image Captioning". ICCV 2019

Python 333 61 Updated May 2, 2021

Arhosseini77 / SUM

[WACV2025 Oral] SUM: Saliency Unification through Mamba for Visual Attention Modeling

Python 64 7 Updated Apr 13, 2025

ZhikangNiu / encodec-pytorch

unofficial implementation of the High Fidelity Neural Audio Compression

Python 155 14 Updated Aug 15, 2024

jishengpeng / WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,115 87 Updated Mar 2, 2025

ldzhangyx / instruct-MusicGen

The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".

Python 83 5 Updated Sep 2, 2024

voidful / vall-e-encodec

Python 41 9 Updated May 15, 2023