Skip to content
/ PTgen Public

Probabilistic transcriptions of recorded speech

Notifications You must be signed in to change notification settings

uiuc-sst/PTgen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PTgen

Acquire and evaluate probabilistic transcriptions (PT's) of speech recordings generated by mismatched crowdsourcing: by many people who don't know the recordings' language.

The technique is described in this ICASSP paper, this AAAI paper, and without jargon in this Technograph article.

The theory is described in sections II.C, III.B, and V of this IEEE TASLP paper.

A stage-by-stage description is found in this Interspeech paper.

How to build on Ubuntu

Install OpenFST, Carmel, and at least the compute-wer executable of Kaldi.

git clone https://github.com/uiuc-sst/PTgen && cd src && make

The first time you make, you'll be asked to enter the directory of OpenFST's file fst/compat.h. This is usually /usr/local/include. If it isn't, then rm config.mk; make and instead try a result from the command locate fst/compat.h.

How to get crowdsourced transcriptions

See the subdirectory mturk.

How to create and evaluate PT's

Edit the settings file, e.g. test/2016-08-24/settings.

  • Ensure that the required files within that file's $DATA exist, or can be downloaded from that file's $DATA_URL (because they're too big for github).

If you're using MCASR, in the settings file set mcasr=1.

If needed, split the transcriptions into train/dev/eval sets.

Process the PT's: run.sh settings.

If run.sh can't find the executable programs of OpenFST, Carmel, or Kaldi, it asks for their locations, and remembers your answers in a file config.sh.

If you encounter errors and fix them, you can save time by starting run.sh partway through: in the settings file, set startstage to one past your last successfully completed stage.

Redesign in progress

Instead of run.sh:

cd test/prepare; ../../prepare.sh settings builds P.fst and L.fst from only WS15 data.

Then, apply.sh will read those FSTs, crowdsourced transcriptions for utterances in a new language L, and optional ground-truth transcriptions, to compute transcriptions in L and measure their word error rate.

How to run prebuilt tests

cd test/ws15 (or any other test directory).

../../run.sh settings-foo

If ../../run.sh asks again where to find exes, just abort it with ctrl+C, retrieve those settings with cp ../../config.sh ., and rerun.