Pipeline for generating images conditioned on input audio
-
Updated
Jul 25, 2024 - Python
Pipeline for generating images conditioned on input audio
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
A simple Speech Emotion Recognition (SER) project based on Wav2Vec2.
Self-Supervised Speech Pre-training and Representation Learning Toolkit
🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 where we'll be open for contributions to enable real-time meeting transcription! 🚀
A Python tool for transcribing speech from audio files using the Wav2Vec 2.0 model. Supports multilingual transcription, automatic audio chunking, and easy setup
A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus
Speaker recognition task using wav2vec2 model.
[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.
A live speech recognition using Facebooks wav2vec 2.0 model.
BALanced Execution through Natural Activation : a human-computer interaction methodology for code running.
Fine-tuning Multilingual Large Speech Recognition Models: Wav2vec and Whisper
[ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations
This repository is an implementation of the Wav2Vec2 model for converting speech into text through a series of speech recognition, noise removal and STT to transcribe the text from a video file.
Performing audio transcription using the Wav2Vec2 model trained on the Common Voice dataset 13 for Indonesian.
A modular codebase to process audio dataset, generate custom tokenizer, finetune and infer wav2vec2 model on custom dataset.
Developed an AI tool to automatically generate captions and transcripts for YouTube videos in 67 languages and can generate summarized texts in 133 languages.
⚡ Finetune Wa2vec 2.0 For Speech Recognition
Add a description, image, and links to the wav2vec2 topic page so that developers can more easily learn about it.
To associate your repository with the wav2vec2 topic, visit your repo's landing page and select "manage topics."