#

asr

Here are 1,059 public repositories matching this topic...

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust

Updated Aug 20, 2024
C++

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Aug 20, 2024
Python

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Updated Aug 20, 2024
Python

cmeraki / audiotoken

Audio tokenization, in the fastest way possible!

tts audio-processing asr llm llm-training llm-inference

Updated Aug 20, 2024
Python

platform

voicegain / platform

Voicegain Enterprise Speech-to-Text Platform (API, Portal, etc.)

deep-neural-networks ivr speech-to-text rtc transcription asr mrcp

Updated Aug 20, 2024
HTML

deepgram / deepgram-python-sdk

Official Python SDK for Deepgram's automated speech recognition APIs.

python speech-recognition hacktoberfest asr deepgram automated-speech-recognition

Updated Aug 20, 2024
Python

k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi

python cpp websocket pytorch speech-recognition transducer asr ctc end-to-end-asr

Updated Aug 20, 2024
C++

DmitryRyumin / ICASSP-2023-24-Papers

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Updated Aug 20, 2024
Python

sandy1990418 / ChineseTaiwaneseWhisper

This repository focuses on leveraging OpenAI's Whisper model for speech recognition in Chinese (Mandarin) and Taiwanese Hokkien languages. It includes tools and scripts for data preprocessing, model training, and evaluation, tailored to improve speech recognition accuracy for these languages.

realtime chinese speech-to-text gradio whisper asr taiwanese streaming-audio

Updated Aug 20, 2024
Python

Picovoice / cheetah

On-device streaming speech-to-text engine powered by deep learning

voice-recognition speech-recognition automatic-speech-recognition speech-to-text transcription stt asr online-speech-recognition streaming-speech-to-text

Updated Aug 19, 2024
Python

vwkyc / ASSR

sentiment analysis on transcribed speech or text with multilingual capability

nlp app google sentiment-analysis nlu speech-recognition speech-to-text google-app-engine whisper asr natural-language-understanding natural-language-api whisper-api openai-api

Updated Aug 19, 2024
JavaScript

inworld-ai / inworld-nodejs-sdk

Node.js SDK for Inworld.ai. Integrate AI characters into your Node.js environment.

ai character tts speech-recognition npc asr

Updated Aug 19, 2024
JavaScript

MaxLSB / FBK-ASR-EarlyConformer-Models

Implemented Zipformer-like architectures using FBK’s Early Conformer model for Automatic Speech Recognition

python pytorch asr

Updated Aug 19, 2024
Python

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

speech speech-recognition speech-to-text whisper asr

Updated Aug 19, 2024
Python

winstxnhdw / CapGen

A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.

docker automatic-speech-recognition whisper asr granian huggingface huggingface-spaces ctranslate2 litestar

Updated Aug 19, 2024
Python

jp1924 / wav2vec2

에전에 수행한 ASR 프로젝트 재구현한 repo

Updated Aug 19, 2024
Python

jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

python cli youtube youtube-video youtube-api captions subtitles transcript subtitle transcripts asr youtube-subtitles youtube-transcripts youtube-captions youtube-transcript translating-transcripts youtube-asr

Updated Aug 19, 2024
Python

thewh1teagle / pyannote-rs

pyannote audio diarization in rust

rust speech-recognition whisper asr diarization onnxruntime

Updated Aug 18, 2024
Rust

mkiol / dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.

text-to-speech translator translation offline machine-translation sailfishos tts speech-synthesis speech-recognition speech-to-text nmt linux-desktop stt asr flatpak-applications

Updated Aug 18, 2024
C++

blip-radar / vatsim-parser

Parser for a variety of VATSIM-related file formats

vatsim euroscope asr sct topsky-plugin

Updated Aug 18, 2024
Rust

Improve this page

Add a description, image, and links to the asr topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the asr topic, visit your repo's landing page and select "manage topics."