wer-sigtest

Script to perform statistical significance test between ASR (Automatic Speech Recognition) transcription hypotheses. This can be used to evaluate whether differences in WER (word error rate) are actually significant or not (on the same test set).

Install

You will need to use the comands sclite and sc_stats from the NIST Scoring Toolkit available here: http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sctk.htm

Files

RUN.sh contains an example script to (1) generate an SGML (XML-like) file for transcript hypotheses you want to compare, and (2) to compare the hypotheses using a statistical significance test of your choice. The repo contains:

ref.trn : A reference (ground truth) transcript in the format of .
hyp.A.trn and hyp.B.trn : Two transcript hypthoses each generated by different ASR setups.

Generate SGML file.

Run the following command for hypothesis A and B.

sclite -F -i wsj -r ref.trn -h hyp.A.trn -o sgml

sclite -F -i wsj -r ref.trn -h hyp.B.trn -o sgml

Compare the hyptheses

cat hyp.A.trn.sgml hyp.B.trn.sgml | sc_stats -p -t mapsswe -v -u -n result.A-B.mapsswe

Results

result.A-B.mapsswe.stats.unified will be generated (output below) stating that p < 0.001 between the hypotheses.

|------------------------------------------------------------------------------|
|   Test   ||            |  hyp.A.trn  |       hyp.B.trn        ||    Test     |
| Abbrev.  ||            |             |                        ||   Abbrev.   |
|----------++------------+-------------+------------------------++-------------|
|    MP    || hyp.A.trn  |             | hyp.B.trn   0.007   ** ||     MP      |
|----------++------------+-------------+------------------------++-------------|
|    MP    || hyp.B.trn  |             |                        ||     MP      |
|------------------------------------------------------------------------------|

Statistical Significance Tests Available

Instead of the mapsswe (Matched Pairs Sentence-Segment Word Error) option, you can use mcn (McNemar), sign, wilc (Wilcoxon Signed Rank ), anovar (Analysis of Variance), and std4 (standard four - mcn, mapsswe, wilc, and sign).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wer-sigtest

Install

Files

Generate SGML file.

Compare the hyptheses

Results

Statistical Significance Tests Available

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
RUN.sh		RUN.sh
hyp.A.trn		hyp.A.trn
hyp.A.trn.sgml		hyp.A.trn.sgml
hyp.A.trn.sts		hyp.A.trn.sts
hyp.B.trn		hyp.B.trn
hyp.B.trn.sgml		hyp.B.trn.sgml
hyp.B.trn.sts		hyp.B.trn.sts
ref.trn		ref.trn
result.A-B.mapsswe.stats.mapsswe		result.A-B.mapsswe.stats.mapsswe
result.A-B.mapsswe.stats.unified		result.A-B.mapsswe.stats.unified

talhanai/wer-sigtest

Folders and files

Latest commit

History

Repository files navigation

wer-sigtest

Install

Files

Generate SGML file.

Compare the hyptheses

Results

Statistical Significance Tests Available

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages