This program allows pronunciation assessment via asynchronous communication from Azure Speech SDK.
It supports audio (WAV format) longer than 15 seconds. It processes multiple files at once, and outputs each assessment in CSV format.
Azure Cognitive Services' resource is required.
Required
token.json
... example:{"key1":"*********", "region":"eastasia"}
- a folder
submit
... includes scripts(.txt) and voices(.wav)
corresponding voice and script should have the same file name, likesample.wav
andsample.txt
- create a folder
output
... grade will be written in this folder
Run
python ./main.py
Result
output/grade-{audioname}.csv
Azure Cognitive Services grades voices sentence-by-sentence. For the evaluation of the whole paragraph, this program re-calculates grading:
- Accuracy score: weighted average of each sentence's accuracy score
- pronunciation score: weighted average of each sentence's pronunciation score
- completeness score: percentage of words with error_type
None
, instead ofInsertion
andOmission
- fluency score: percentage of time actually spoken
- input
- submit/sample.wav: saying
What time is it?
ref: Sample Voice - submit/sample.txt > "What time is it now in Japan?" (deliberate mistake)
- submit/sample.wav: saying
- output: grade-sample.csv
File: | sample | ||||
---|---|---|---|---|---|
Accuracy | Pronunciation | Completeness | Fluency | ||
Summary | 5 | 5 | 2.857142857 | 5 | |
Sentence | Accuracy | Pronunciation | Completeness | Fluency | recognized |
No.1 | 5 | 5 | 5 | 5 | What time is it? |
Word | Phoneme | Accuracy | error type | ||
what | 5 | None | |||
w | 5 | ||||
aa | 3.5 | ||||
t | 5 | ||||
time | 5 | None | |||
t | 5 | ||||
ay | 5 | ||||
m | 5 | ||||
is | 5 | None | |||
ih | 5 | ||||
z | 5 | ||||
it | 5 | None | |||
ih | 5 | ||||
t | 5 | ||||
now | 0 | Omission | |||
in | 0 | Omission | |||
japan | 0 | Omission |