wav2vec2mdd-Text

This repository is an implementation of the paper "Text-Aware End-to-end Mispronunciation Detection and Diagnosis."

Abstract In this paper, we present a gating strategy that assigns more importance to the relevant audio features while suppressing irrelevant text information. Moreover, given the transcriptions, we design an extra contrastive loss to reduce the gap between the learning objective of phoneme recognition and MDD.

Installation

Requirements

Linux, CUDA>=11, GCC>=5.4
Python>=3.8

We recommend you to use Anaconda to create a conda environment:
```
conda create -n w2vText python=3.8
```
Then, activate the environment:
```
conda activate w2vText
```
PyTorch>=1.6.1 (following instructions here)

For example, you could install pytorch and torchvision as following:
```
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
```
Other requirements
```
pip install soundfile editdistance
```
Fairseq

We design the network via the fairseq package. If you are familar with fairseq, you can check wav2vec model::wav2vec_sigmoid and criterion::ctc_constrast. Otherwise, you should install the modified version as following:
```
cd fairseq && pip install --editable .
```
Alterantive, we can install "viterbi" package to omit the complex install process of flashlight binding:
```
cd viterbi && python setup.py install
```

Usage

Before use following script to train and test model, you should check the data path (see *.tsv files in data directory) and reference path.

Training

sh run.sh

Inference

sh test.sh && sh mdd.sh

Our best model result are included in diretory experiment/result, you can check it directly run "sh mdd.sh", and if you have any question about it, please contact us. Thanks!

Cite

If you find this work useful in your research, please consider citing:

@article{peng2022text,
  title={Text-Aware End-to-end Mispronunciation Detection and Diagnosis},
  author={Peng, Linkai and Gao, Yingming and Lin, Binghuai and Ke, Dengfeng and Xie, Yanlu and Zhang, Jinsong},
  journal={arXiv preprint arXiv:2206.07289},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
experiment		experiment
fairseq		fairseq
reslut		reslut
viterbi		viterbi
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

experiment

experiment

fairseq

fairseq

reslut

reslut

viterbi

viterbi

README.md

README.md

Repository files navigation

wav2vec2mdd-Text

Installation

Requirements

Usage

Training

Inference

Cite

About

Releases

Packages

Languages

vocaliodmiku/wav2vec2mdd-Text

Folders and files

Latest commit

History

Repository files navigation

wav2vec2mdd-Text

Installation

Requirements

Usage

Training

Inference

Cite

About

Resources

Stars

Watchers

Forks

Languages