A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
Updated
Jul 2, 2024 - Python
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Repository containing the open source code of works published at the FBK MT unit.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
End-to-End Speech Processing Toolkit
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
SEGAUGMENT: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations
🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.
Pushing the Limits of Zero-shot End-to-End Speech Translation
Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.
A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"
unsupervised spoken utterances scoring
Code for ACL 2023 main conference paper "Back Translation for Speech-to-text Translation Without Transcripts".
Code for ACL 2023 main conference paper "Understanding and Bridging the Modality Gap for Speech Translation".
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".
The project for speech translation
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
Add a description, image, and links to the speech-translation topic page so that developers can more easily learn about it.
To associate your repository with the speech-translation topic, visit your repo's landing page and select "manage topics."