This is an implementation of paper "Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation" (read the paper here).
We propose AmbigST to mitigate speech sense ambiguity in speech translation.
The models are trained based on pytorch.
Language Pair | Download Link |
---|---|
AmbigST En-De | Download |
AmbigST En-Fe | Download |
AmbigST En-Es | Download |
AmbigST Fr-En | Download |
AmbigST Es-En | Download |
AmbigST De-En | Download |
git submodule update --init SpeechUT/fairseq
cd SpeechUT/
pip install --editable fairseq/
pip install sacrebleu==1.5.1
Download the pretrained model of SpeechUT
ST models are fine-tuned with fairseq speech-to-text task, so just follow the data preparation instructions here. To fine-tune our released models, you should use the same sentecepiece models and dictionaries as ours:
- En-De: Sentencepiece Model, Dict
- En-Es: Sentencepiece Model, Dict
- En-Fr: Sentencepiece Model, Dict
- De-En: Sentencepiece Model, Dict
- Fr-En: Sentencepiece Model, Dict
- Es-En: Sentencepiece Model, Dict
We provided examples in example
.
To finetune the model we released, you can use the dataset we provide
- En-De: Train, Dev
- En-Es: Train, Dev
- En-Fr: Train, Dev
- En-De: Train, Dev
- En-Es: Train, Dev
- En-Fr: Train, Dev
To construction your own Dataset, please refer to the process we provide in create_homophone
and annotate_data
model_path=path/to/your/pre-trained/model
data_dir=dataset/MuSTC/en-${lang}
bash /speechut/scripts/fine_tune/en_de/all.sh $model_path $data_dir
Please check the folder /speechut/scripts/fine_tune
for detailed configuration.
You might average several model checkpoints with the best dev accuracy to stablize the performance,
python fairseq/scripts/average_checkpoints.py --inputs $model_dir/checkpoint.best_acc*.pt --output $model_dir/checkpoint.avgnbest.pt
Then decode the model with beam search,
model_path=path/to/your/fine-tuned/model
data_dir=dataset/MuSTC/en-${lang}
bash speechut/scripts/inference_st.sh $model_path $data_dir ${lang} tst-COMMON
This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ and Microsoft Open Source Code of Conduct
If you find our work is useful in your research, please cite the following paper:
For help or issues using AmbigST models, please submit a GitHub issue.
For other communications related to AmbigST, please contact Tengfei Yu (921692739@qq.com
).