Stars
Thai natural language processing in Python
The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:
An open-source project for Windows developers to learn how to add AI with local models and APIs to Windows apps.
Prism is a framework for building loosely coupled, maintainable, and testable XAML applications in WPF, Xamarin Forms, and Uno / Win UI Applications..
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Fast and accurate automatic speech recognition (ASR) for edge devices
A recreation of the classic Visual Basic 6 IDE and language in C# with Avalonia
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
The .NET MAUI Community Toolkit is a community-created library that contains .NET MAUI Extensions, Advanced UI/UX Controls, and Behaviors to help make your life as a .NET MAUI developer easier
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
A C# library for extract audio features in speech recognition (ASR) task, support kaldi fbank
Speech To Speech: an effort for an open-sourced and modular GPT4-o
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
Source code for Consistent ensemble distillation for audio tagging
SharpToken is a C# library for tokenizing natural language text. It's based on the tiktoken Python library and designed to be fast and accurate.
This project implements token calculation for OpenAI's gpt-4 and gpt-3.5-turbo model, specifically using `cl100k_base` encoding.
Multilingual Voice Understanding Model
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.