Keyword Spotting for ASR systems

About this project

One interesting application of speech technology is the detection of a particular word, or phrase, in a stream of audio. Most state-of-the-art speech keyword spotting (KWS) systems make use of large vocabulary speech recognition systems to generate lattice which are then used in the KWS process. Efficiency, OOV queries, and threshold selection are the 3 major criteria for evaluating a KWS solution.

Note: While this work was only experimented on 1-best outputs, it is compatible with a lattice output too.

Note 2: Keyword spotting is not intended to be run on the fly but on a large number of pre-generated inputs. If the user is interested in creating a voice assistant that detects a certain wake sequence of words, he/she should use an audio classification model instead. See https://huggingface.co/learn/audio-course/chapter7/voice-assistant for a neat guide on this subject.

Methodology

Morphological decomposition → to break down words or phrases into morphemes and to tackle OOVs
Score normalization → normalize the ASR output scores with respect to the current hit sequence
Grapheme confusion → the idea is to handle each OOV word by replacing it with the closest IV word with respect to the weighted Levenshtein distance. The weights used here are a function of the frequencies of all grapheme confusions.

Results

#	System	TWV All	TWV In-vocabulary (TWV IV)	TWV Out-of-vocabulary (TWV OOV)	PFA (probability of false alarm)	PM (probability of missing)	Dec. Tresh
1	`decode.ctm` (base)	0.319	0.401	0.000	0.00002	0.663	0.0425
2	`decode-morph.ctm` (morph-based system)	0.317	0.381	0.068	0.00003	0.651	0.0373
3	`decode-morph-custom.ctm` (base+mapping)	0.311	0.387	0.018	0.00004	0.652	0.0370
4	`decode.ctm` (base + score normalization)	0.320	0.402	0.000	0.00002	0.663	0.0074
5	`decode-morph.ctm` (morph-based system + score normalization)	0.325	0.391	0.068	0.00003	0.651	0.0040
6	`decode-morph-custom.ctm` (base+mapping + score normalization)	0.316	0.393	0.018	0.00004	0.662	0.0040
7	`decode.ctm` (base + grapheme confusion)	0.269	0.346	-0.028	0.00011	0.623	0.0425
8	`decode-morph.ctm.ctm` (base + grapheme confusion)	0.285	0.342	0.064	0.00007	0.649	0.0076
9	`decode-morph.ctm.ctm` (base + grapheme confusion + score normalization)	0.323	0.389	0.067	0.00007	0.649	0.0007
10	`decode.ctm` (base + grapheme confusion + score normalization)	0.337	0.407	0.067	0.00011	0.623	0.0006

Table 4: Keyword spotting results comparing index search with base configuration, morphological decomposition, score normalization and grapheme confusion. Best result for each metric is highlighted.

Observations:

Combining morphological decomposition and grapheme confusion is redundant. Even worse, it actually makes the system less performant in general (see system #8 which achieves the worst TWV All score of all systems).
The best system with respect to TWV All is system #10 with grapheme confusion and score normalization with a score of $0.337$. It is also the model with the lowest probability of missing ($PM = 0.623$). The fact that this system misses more words comes from the fact that grapheme confusion can only output words from the IV set: this bias seems to guide the system in the correct direction.

Full report

Open report.pdf for the full report.

Acknowledgments

This project is part of a coursework for the MLMI14 course of the Cambridge MPhil in Machine Learning and Machine Intelligence. All the code pushed in the 1st commit is not mine and thus belongs to the authors of the code. On the other hand, all the Python code has been coded from scratch and is thus mine.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
IndusDB		IndusDB
figs		figs
kws		kws
lib		lib
output		output
results		results
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
apply_morph_to_ctm.py		apply_morph_to_ctm.py
apply_morph_to_queries.py		apply_morph_to_queries.py
apply_system_combination.py		apply_system_combination.py
pytest.ini		pytest.ini
report.pdf		report.pdf
requirements.txt		requirements.txt
search.py		search.py

License

tonywu71/keyword-spotting-for-asr

Folders and files

Latest commit

History

Repository files navigation

Keyword Spotting for ASR systems

About this project

Methodology

Results

Full report

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages