Skip to content

xiongyihui/Make-a-smart-speaker

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

To make a smart speaker

中文

Here is a collection of resources to make a smart speaker. Hope we can make an open source one for daily use.

The simplified flowchart of a smart speaker is like:

+---+   +----------------+   +---+   +---+   +---+
|Mic|-->|Audio Processing|-->|KWS|-->|STT|-->|NLU|
+---+   +----------------+   +---+   +---+   +-+-+
                                               |
                                               |
+-------+   +---+   +----------------------+   |
|Speaker|<--|TTS|<--|Knowledge/Skill/Action|<--+
+-------+   +---+   +----------------------+
  • Audio Processing includes Acoustic Echo Cancellation (AEC), Beamforming, Noise Suppression (NS), etc.
  • Keyword Spotting (KWS) detects a keyword (such as OK Google, Hey Siri) to start a conversation.
  • Speech To Text (STT)
  • Natural Language Understanding (NLU) converts raw text into structured data.
  • Knowledge/Skill/Action - Knowledge base and plugins (Alexa Skill, Google Action) to provide an answer.
  • Text To Speech

KWS + STT + NLU + Skill + TTS

Active open source projects

SDK

KWS

  • Snowboy - DNN based hotword and wake word detection toolkit
  • Honk - PyTorch reimplementation of Google's TensorFlow CNNs for keyword spotting
  • ML-KWS-For-MCU - Maybe the most promise for resource constrained devices such as ARM Cortex M7 microcontroller

STT

NLU

TTS

  • Mimic - Mycroft's TTS engine, based on CMU's Flite (Festival Lite)
  • manytts - an open-source, multilingual text-to-speech synthesis system written in pure java
  • espeak-ng - an open source speech synthesizer that supports 99 languages and accents.
  • ekho - Chinese text-to-speech engine
  • WaveNet, Tacotron 2

Audio Processing

  • Acoustic Echo Cancellation

  • Direction Of Arrival (DOA) - Most used DOA algorithms is GCC-PHAT

  • Beamforming

  • Voice Activity Detection

    • WebRTC VAD
    • DNN VAD
  • Noise Suppresion

    • NS of WebRTC audio processing

Audio I/O

About

A collection of resources to make a smart speaker

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages