InQSS: a speech intelligibility and quality assessment model using a multi-task learning network

This code is related to the paper accepted by INTERSPEECH 2022:
InQSS: a speech intelligibility and quality assessment model using a multi-task learning network
https://www.isca-speech.org/archive/pdfs/interspeech_2022/chen22i_interspeech.pdf

TMHINT-QI dataset

The dataset is publicly available now!

Google Drive (recommended): Download link

NARS: Download link pw:tmhintqi
(This link is for previewing samples in the dataset. We have been informed of issues when attempting to download all the data at once using this link. )

train: the wave files of the training utterances
test: the wave files of the testing utterances
raw_data.csv: record the corresponding subject index, wave file name, quality score, and intelligibility score
test_scores.csv: objective scores of testing data

Known missing data: the following samples in raw_data don't have the quality scores, thereby not being used in the training and testing set
'FCN_snr0_street_TMHINT_b2_22_03',
'Trans_snr2_pink_TMHINT_g4_24_09',
'DDAE_snr2_pink_TMHINT_b2_22_01',
'DDAE_snr5_white_TMHINT_g4_24_07'

(If you encounter any issues while downloading the dataset, please reach out to the author (yc4093@columbia.edu).)

Notes about the file name:

Method None also indicates a clean wave file. If the method is None, these files are used for the pretest and are part of the SE model training data. The method shows None because it’s not indicated in the wave file name. (Wave files are named as {method}_{snr-level}_{noise}_{cleanUtterance})

Due to the page limit, some details about the dataset are not written in the paper.

- How much does the headphone affect the score?

We designed a website for the listening test, and thus the listeners can do the test in the place they like as long as the environment is quiet. In our test, there are 110 listeners used the provided headphone and 116 listeners used their own headphones. All listeners used the provided headphone used the same headphone.

The following histograms compare the results between using the provided headphone and using their headphones.

-> the average scores under different SNR seem not very much affected by the headphones.

-> the standard deviations of using the provided headphone are not necessarily less than using their headphones.

- The correlation between the results and the user features

Gender (Female: 132, Male: 94)

Age (20-30: 154, 31-40: 42, 41-50: 30)

The original TMHINT corpus

The following paper describes how the TMHINT corpus was designed: https://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi?o=dnclcdr&s=id=%22093NTCN0714003%22.&searchmode=basic#XXX

Correlation between subjective scores and objective scores

We use pysepm to calculate the objective quality and intelligibilty scores of TMHINT-QI testing set. (The details can be found in test_scores.csv.) The following heatmap shows the correlation between different subjective scores and objective scores, where avg_quality and avg_intell denote the average subjective quality and intelligibility scores, respectively.

Citation

If you use the dataset in your research, please cite:

@inproceedings{chen2022inqss,
title = {{InQSS}: a speech intelligibility and quality assessment model using a multi-task learning network},
author = {Chen, Yu-Wen and Tsao, Yu},
booktitle = {Proc. INTERSPEECH 2022}
}

Thanks :-)!

License

The InQSS work is released under MIT License. See LICENSE for more details.

Acknowledgments

Bio-ASP Lab, CITI, Academia Sinica, Taipei, Taiwan

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
InQSS-MOSNet		InQSS-MOSNet
InQSS-SSL		InQSS-SSL
plot		plot
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InQSS-MOSNet

InQSS-MOSNet

InQSS-SSL

InQSS-SSL

plot

plot

LICENSE

LICENSE

README.md

README.md

Repository files navigation

InQSS: a speech intelligibility and quality assessment model using a multi-task learning network

TMHINT-QI dataset

Notes about the file name:

- How much does the headphone affect the score?

- The correlation between the results and the user features

Gender (Female: 132, Male: 94)

Age (20-30: 154, 31-40: 42, 41-50: 30)

The original TMHINT corpus

Correlation between subjective scores and objective scores

Citation

License

Acknowledgments

About

Releases

Packages

Languages

License

yuwchen/InQSS

Folders and files

Latest commit

History

Repository files navigation

InQSS: a speech intelligibility and quality assessment model using a multi-task learning network

TMHINT-QI dataset

Notes about the file name:

- How much does the headphone affect the score?

- The correlation between the results and the user features

Gender (Female: 132, Male: 94)

Age (20-30: 154, 31-40: 42, 41-50: 30)

The original TMHINT corpus

Correlation between subjective scores and objective scores

Citation

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Languages