To make a smart speaker

Here is a collection of resources to make a smart speaker. Hope we can make an open source one for daily use.

The simplified flowchart of a smart speaker is like:

+---+   +----------------+   +---+   +---+   +---+
|Mic|-->|Audio Processing|-->|KWS|-->|STT|-->|NLU|
+---+   +----------------+   +---+   +---+   +-+-+
                                               |
                                               |
+-------+   +---+   +----------------------+   |
|Speaker|<--|TTS|<--|Knowledge/Skill/Action|<--+
+-------+   +---+   +----------------------+

Audio Processing includes Acoustic Echo Cancellation (AEC), Beamforming, Noise Suppression (NS), etc.
Keyword Spotting (KWS) detects a keyword (such as OK Google, Hey Siri) to start a conversation.
Speech To Text (STT)
Natural Language Understanding (NLU) converts raw text into structured data.
Knowledge/Skill/Action - Knowledge base and plugins (Alexa Skill, Google Action) to provide an answer.
Text To Speech

KWS + STT + NLU + Skill + TTS

Active open source projects

Mycroft ⭐ - a hackable open source voice assistant
dingdang robot - a 🇨🇳 voice interaction robot based on Jasper and built with raspberry pi

SDK

Amazon Alexa Voice Service - is the most widely used voice assistant
Google Assistant SDK

It has the smartest brain, its extension called Google Action can be created on a few steps with digitalflow.ai and its Device Action is very suit for home smart devices.
Baidu DuerOS

KWS

Snowboy - DNN based hotword and wake word detection toolkit
Honk - PyTorch reimplementation of Google's TensorFlow CNNs for keyword spotting
ML-KWS-For-MCU - Maybe the most promise for resource constrained devices such as ARM Cortex M7 microcontroller

STT

Mozilla DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
Kaldi
PocketSphinx - a lightweight speech recognition engine using HMM + GMM

NLU

Rasa NLU
- Rasa NLU for Chinese

TTS

Mimic - Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
manytts - an open-source, multilingual text-to-speech synthesis system written in pure java
espeak-ng - an open source speech synthesizer that supports 99 languages and accents.
ekho - Chinese text-to-speech engine
WaveNet, Tacotron 2

Audio Processing

Acoustic Echo Cancellation
- SpeexDSP
Direction Of Arrival (DOA) - Most used DOA algorithms is GCC-PHAT
- odas
Beamforming
- BeamformIt - filter&sum beamforming
- CGMM Beamforming - a reference implementation
- MVDR Beamforming
- GSC Beamforming
Voice Activity Detection
- WebRTC VAD
- DNN VAD
Noise Suppresion
- NS of WebRTC audio processing

Audio I/O

PortAudio
libsoundio
ALSA
PulseAudio

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
zh.md		zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

To make a smart speaker

KWS + STT + NLU + Skill + TTS

Active open source projects

SDK

KWS

STT

NLU

TTS

Audio Processing

Audio I/O

About

Releases

Packages

xiongyihui/Make-a-smart-speaker

Folders and files

Latest commit

History

Repository files navigation

To make a smart speaker

KWS + STT + NLU + Skill + TTS

Active open source projects

SDK

KWS

STT

NLU

TTS

Audio Processing

Audio I/O

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages