PTgen

Acquire and evaluate probabilistic transcriptions (PT's) of speech recordings generated by mismatched crowdsourcing: by many people who don't know the recordings' language.

The technique is described in this ICASSP paper, this AAAI paper, and without jargon in this Technograph article.

The theory is described in sections II.C, III.B, and V of this IEEE TASLP paper.

A stage-by-stage description is found in this Interspeech paper.

How to build on Ubuntu

Install OpenFST, Carmel, and at least the compute-wer executable of Kaldi.

git clone https://github.com/uiuc-sst/PTgen && cd src && make

The first time you make, you'll be asked to enter the directory of OpenFST's file fst/compat.h. This is usually /usr/local/include. If it isn't, then rm config.mk; make and instead try a result from the command locate fst/compat.h.

How to get crowdsourced transcriptions

See the subdirectory mturk.

How to create and evaluate PT's

Edit the settings file, e.g. test/2016-08-24/settings.

Ensure that the required files within that file's $DATA exist, or can be downloaded from that file's $DATA_URL (because they're too big for github).

If you're using MCASR, in the settings file set mcasr=1.

If needed, split the transcriptions into train/dev/eval sets.

Process the PT's: run.sh settings.

If run.sh can't find the executable programs of OpenFST, Carmel, or Kaldi, it asks for their locations, and remembers your answers in a file config.sh.

If you encounter errors and fix them, you can save time by starting run.sh partway through: in the settings file, set startstage to one past your last successfully completed stage.

Redesign in progress

Instead of run.sh:

cd test/prepare; ../../prepare.sh settings builds P.fst and L.fst from only WS15 data.

Then, apply.sh will read those FSTs, crowdsourced transcriptions for utterances in a new language L, and optional ground-truth transcriptions, to compute transcriptions in L and measure their word error rate.

How to run prebuilt tests

cd test/ws15 (or any other test directory).

../../run.sh settings-foo

If ../../run.sh asks again where to find exes, just abort it with ctrl+C, retrieve those settings with cp ../../config.sh ., and rerun.

Name		Name	Last commit message	Last commit date
Latest commit History 317 Commits
mcasr		mcasr
mturk		mturk
src		src
steps		steps
test		test
util		util
xlat		xlat
.gitignore		.gitignore
README.md		README.md
Technograph.md		Technograph.md
apply.sh		apply.sh
datasplit.md		datasplit.md
prepare.sh		prepare.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTgen

How to build on Ubuntu

How to get crowdsourced transcriptions

How to create and evaluate PT's

Redesign in progress

How to run prebuilt tests

About

Releases

Packages

Contributors 3

Languages

uiuc-sst/PTgen

Folders and files

Latest commit

History

Repository files navigation

PTgen

How to build on Ubuntu

How to get crowdsourced transcriptions

How to create and evaluate PT's

Redesign in progress

How to run prebuilt tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages