Skip to content
View tranquangchung's full-sized avatar

Highlights

  • Pro

Block or report tranquangchung

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Awesome speech/audio LLMs, representation learning, and codec models

969 59 Updated Apr 18, 2025

A Survey of Spoken Dialogue Models (60 pages)

288 16 Updated Nov 28, 2024

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 197 15 Updated Mar 7, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,696 129 Updated Apr 21, 2025

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python 175 11 Updated Jul 12, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 550 51 Updated Jun 9, 2024

SpeechGPT Series: Speech Large Language Models

Python 1,367 91 Updated Jul 22, 2024

Spark-TTS Inference Code

Python 8,784 905 Updated Apr 9, 2025

toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Python 10 2 Updated Sep 1, 2024

Facebook Low Resource (FLoRes) MT Benchmark

Python 729 127 Updated Nov 20, 2023

[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents

Python 1,603 106 Updated Apr 21, 2025

This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…

Jupyter Notebook 3,185 401 Updated Apr 8, 2025

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Python 3,989 438 Updated Apr 15, 2025

code for AAAI2022 paper "Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification"

Python 193 40 Updated Jun 30, 2024

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,492 1,134 Updated Nov 14, 2024

Fast and memory-efficient exact attention

Python 17,056 1,629 Updated Apr 24, 2025

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 44,958 4,989 Updated Apr 22, 2025

Medical o1, Towards medical complex reasoning with LLMs

Python 1,074 107 Updated Jan 20, 2025

zero-shot voice conversion & singing voice conversion, with real-time support

Python 2,291 254 Updated Apr 20, 2025

[WIP] Resources for AI engineers. Also contains supporting materials for the book AI Engineering (Chip Huyen, 2025)

Jupyter Notebook 3,919 470 Updated Feb 12, 2025

Audio Codec Speech processing Universal PERformance Benchmark

Python 251 23 Updated Apr 14, 2025

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,728 193 Updated Jan 16, 2025

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Python 6,789 948 Updated Jul 3, 2024

Code for paper "Attention on Attention for Image Captioning". ICCV 2019

Python 333 61 Updated May 2, 2021

[WACV2025 Oral] SUM: Saliency Unification through Mamba for Visual Attention Modeling

Python 64 7 Updated Apr 13, 2025

unofficial implementation of the High Fidelity Neural Audio Compression

Python 155 14 Updated Aug 15, 2024

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,115 87 Updated Mar 2, 2025

The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".

Python 83 5 Updated Sep 2, 2024
Python 41 9 Updated May 15, 2023
Next
Showing results