Skip to content

Script to perform statistical significance test between ASR hypotheses.

Notifications You must be signed in to change notification settings

talhanai/wer-sigtest

Repository files navigation

wer-sigtest

Script to perform statistical significance test between ASR (Automatic Speech Recognition) transcription hypotheses. This can be used to evaluate whether differences in WER (word error rate) are actually significant or not (on the same test set).

Install

You will need to use the comands sclite and sc_stats from the NIST Scoring Toolkit available here: http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sctk.htm

Files

RUN.sh contains an example script to (1) generate an SGML (XML-like) file for transcript hypotheses you want to compare, and (2) to compare the hypotheses using a statistical significance test of your choice. The repo contains:

  • ref.trn : A reference (ground truth) transcript in the format of .
  • hyp.A.trn and hyp.B.trn : Two transcript hypthoses each generated by different ASR setups.

Generate SGML file.

Run the following command for hypothesis A and B.

sclite -F -i wsj -r ref.trn -h hyp.A.trn -o sgml
sclite -F -i wsj -r ref.trn -h hyp.B.trn -o sgml

Compare the hyptheses

cat hyp.A.trn.sgml hyp.B.trn.sgml | sc_stats -p -t mapsswe -v -u -n result.A-B.mapsswe

Results

result.A-B.mapsswe.stats.unified will be generated (output below) stating that p < 0.001 between the hypotheses.

|------------------------------------------------------------------------------|
|   Test   ||            |  hyp.A.trn  |       hyp.B.trn        ||    Test     |
| Abbrev.  ||            |             |                        ||   Abbrev.   |
|----------++------------+-------------+------------------------++-------------|
|    MP    || hyp.A.trn  |             | hyp.B.trn   0.007   ** ||     MP      |
|----------++------------+-------------+------------------------++-------------|
|    MP    || hyp.B.trn  |             |                        ||     MP      |
|------------------------------------------------------------------------------|

Statistical Significance Tests Available

Instead of the mapsswe (Matched Pairs Sentence-Segment Word Error) option, you can use mcn (McNemar), sign, wilc (Wilcoxon Signed Rank ), anovar (Analysis of Variance), and std4 (standard four - mcn, mapsswe, wilc, and sign).

About

Script to perform statistical significance test between ASR hypotheses.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages