A practical compilation of libraries, case studies, resources, datasets, and research papers revolving around deep learning/machine learning for audio. 🎶🎶🎶 Reasonable resources you will actually use!
- I will add this very soon!
- DeepSpeech: an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
- PaddleSpeech: Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting.
- NeMo: a toolkit for conversaional AI
- Speech Brain: an open-source and all-in-one conversational AI toolkit based on PyTorch.
- torchaudio: Data manipulation and transformation for audio signal processing, powered by PyTorch
- nlpaug: Data augmentation for NLP. This has spectrogram and audio input support. Check this and this
- pedalboard: Spotify's Python library for working with audio. Internally, SPotify uses this for data augmentation and improving machine learning models.
- Deezer: Deezer source separation library including pretrained models.
- Basic Pitch: A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
- librosa: Python library for audio and music analysis
- Best library for certain tasks? Where to focus and when learning tools?
- How to use librosa, one end-to-end tooklkit, torchaudio
- Fundamental papers related to audio deep learning
- Guided walkthroughs from data preparation to deployment
Syntax | Description | Test Text |
---|---|---|
Header | Title | Here's this |
Paragraph | Text | And more |