Acquire and evaluate probabilistic transcriptions (PT's) of speech recordings generated by mismatched crowdsourcing: by many people who don't know the recordings' language.
The theory is described in sections II.C, III.B, and V of this IEEE TASLP paper.
A stage-by-stage description is found in this Interspeech paper.
How to build on Ubuntu
git clone https://github.com/uiuc-sst/PTgen && cd src && make
The first time you
make, you'll be asked to enter the directory of OpenFST's file
This is usually
/usr/local/include. If it isn't, then
rm config.mk; make and
instead try a result from the command
How to get crowdsourced transcriptions
See the subdirectory mturk.
How to create and evaluate PT's
Edit the settings file, e.g.
- Ensure that the required files within that file's
$DATAexist, or can be downloaded from that file's
$DATA_URL(because they're too big for github).
If you're using MCASR, in the settings file set
If needed, split the transcriptions into train/dev/eval sets.
Process the PT's:
run.sh can't find the executable programs of OpenFST, Carmel, or Kaldi, it asks for their locations,
and remembers your answers in a file
If you encounter errors and fix them, you can save time by starting
run.sh partway through:
in the settings file, set
startstage to one past your last successfully completed stage.
Redesign in progress
cd test/prepare; ../../prepare.sh settings builds
L.fst from only WS15 data.
apply.sh will read those FSTs, crowdsourced transcriptions for utterances in a new language L, and optional ground-truth transcriptions,
to compute transcriptions in L and measure their word error rate.
How to run prebuilt tests
cd test/ws15 (or any other test directory).
../../run.sh asks again where to find exes, just abort it with
ctrl+C, retrieve those settings with
cp ../../config.sh ., and rerun.