Incremental ASR Processing

This repository allows you to process speech audio, transcribing it incrementally. This was originally created for a paper published at COLING 2020. You can evaluate incremental ASR systems (using the transcriptions generated in this repository) with our Incremental ASR Evaluation repository.

This paper (by Angus Addlesee, Yanchao Yu, and Arash Eshghi) can be found here and is titled: "A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI".

If you use this repository in your work - please cite us:

Harvard:

Addlesee, A., Yu, Y. and Eshghi, A., 2020. A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI. COLING 2020.

BibTeX:

@inproceedings{addlesee2020evaluation,
  title={A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI},
  author={Addlesee, Angus and Yu, Yanchao and Eshghi, Arash},
  journal={COLING 2020},
  year={2020}
}

Installation

We have created a setup script to prepare your system to process speech with the incremental speech recognition services from Microsoft, IBM, and Google. This script will create a virtual Python environment, and install the required packages within it. You can clone this repository and run the setup with the following commands:

Run git clone https://github.com/wallscope-research/incremental-asr-processing.git
Run cd incremental-asr-processing
Run ./setup.sh

You need to be within this virtual environment to run any processing. To enter and exit this environment, please use the relevant line:

To enter the virtual environment, run source venv/bin/activate
To exit the virtual environment, run deactivate

Note - you only need to run the setup script once, but it must have been run to use the above two commands.

Processing Audio with a System

Within the repository, we have implemented three systems, the three that we evaluated in our COLING 2020 paper - Microsof, IBM, and Google. These can be found in the asr-msoft, asr-ibm, and asr-google directories respectively. Within these directories, you can find the system specific instructions.

Where to Keep Audio

You should store the audio files that you would like to be transcribed in the data directory. We recommend you split these into batches if you are processing a large number of files. For example, we have added ./data/batch1. With this structure, you can use each incremental ASR system to process your audio files in these batches - allowing you to rerun a single batch if something goes wrong.

Switchboard Corpus

We use the Switchboard Corpus to evaluate incremental ASR systems - if you would like to recreate the experiments in our COLING 2020 paper, you can find the information in the switchboard directory. Within you will find our script to 'clean' disfluencies in the gold transcriptions, and find the script to format Switchboard with its timings into the several required formats.

Acknowledgements

Angus Addlesee is funded by Wallscope and The Data Lab. Yanchao Yu is funded by the Horizon2020 SPRING Project. We thank them for their support.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
asr-google		asr-google
asr-ibm		asr-ibm
asr-msoft		asr-msoft
data/batch1		data/batch1
switchboard		switchboard
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Incremental ASR Processing

Installation

Processing Audio with a System

Where to Keep Audio

Switchboard Corpus

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

wallscope-research/incremental-asr-processing

Folders and files

Latest commit

History

Repository files navigation

Incremental ASR Processing

Installation

Processing Audio with a System

Where to Keep Audio

Switchboard Corpus

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages