Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper
-
Updated
Jan 23, 2024 - Python
Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper
AI model for speech disorder detection
Application to search for similar sound effects by voice and title.
Speaker recognition task using wav2vec2 model.
SER and audio classification using both a Wav2Vec2 based model and an ASR->Bert pipeline, as well as utilizing a multimodal late-fusion model
A simple Speech Emotion Recognition (SER) project based on Wav2Vec2.
Spoken NER implementation based on Wav2Vec2-XLS-R with experiments on transfer learning
Pytorch implementation of INTEGRATED PARAMETER-EFFICIENT TUNING FOR GENERAL-PURPOSE AUDIO MODELS
A modular codebase to process audio dataset, generate custom tokenizer, finetune and infer wav2vec2 model on custom dataset.
Automatic generation of speech dataset markup using Wav2Vec2 ASR models
A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus
[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
Deep audio modeling
FM signal capturing system and voice recognition for the assistance of individuals with hearing impairments.
This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"
Python application for taking audio notes and create summary of meetings.
Add a description, image, and links to the wav2vec2 topic page so that developers can more easily learn about it.
To associate your repository with the wav2vec2 topic, visit your repo's landing page and select "manage topics."