Skip to content

Topic: Speech Recognition

Dave Touretzky edited this page Oct 18, 2018 · 13 revisions

These are working notes for a topic area

Grade Progression: What should students know?

  • Grades K-2: Some types of devices can recognize human speech. This includes most cellphones, and home entertainment systems like Amazon's Echo or Google Home.

  • Grades 3-5: Speech recognition systems use grammatical knowledge to disambiguate homophones such as bear/bare or there/their/they're. Example: "There is no hot water" vs. "Their hot water is off" vs. "They're waiting for the hot water to come back on".

  • Grades 6-8:

  • Grades 9-12:

Readings for Working Group

  1. Machine Learning is Fun Part 6: How to do Speech Recognition with Deep Learning. Adam Geitgey, Medium, December 2016.

  2. How Speech Recognition Works. Sudeesh Puthiyedath,, August 3, 2006. part 1 and part 2

  3. How Siri Works -- Interview with Tom Gruber, CTO of SIRI. Nova Spivack,, January 26, 2010.

Old Readings (replaced)

a. Brief Explanation of AI for Layman

b. Making the Leap from Speech to Dialogue: The Challenge for Human to Machine Communication

c. CACM January 2014 - A historical perspective of speech recognition. Xuedong Huang, James Baker, and Raj Reddy. Commun. ACM 57, 1 (January 2014), 94-103. DOI:

d. CACM April 2018 - Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Björn W. Schuller. Commun. ACM 61, 5 (April 2018), 90-99. DOI:

e. Video: Speech Emotion Recognition.

Demo Resources

Miscellaneous concepts to incorporate

Audio -> Formants -> Phones -> Syllables -> Words -> Phrases

How neural nets improved speech recognition: use of massive training data.

Grammar: recognition does best with conversational English

"How to recognize speech" == "How to wreck a nice beach"

Languages other than English

Accents; child voices

Applications: Alexa, Siri, Cortana. What do they do? How are they useful?

You can’t perform that action at this time.