Skip to content

ytf-philp/AmbigST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation

This is an implementation of paper "Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation" (read the paper here).

👀 Overview

We propose AmbigST to mitigate speech sense ambiguity in speech translation.

Result on En-to-XX

Result on XX-to-En

Download Trained Models

The models are trained based on pytorch.

Language Pair Download Link
AmbigST En-De Download
AmbigST En-Fe Download
AmbigST En-Es Download
AmbigST Fr-En Download
AmbigST Es-En Download
AmbigST De-En Download

⚙️ Setup

git submodule update --init SpeechUT/fairseq
cd SpeechUT/
pip install --editable fairseq/
pip install sacrebleu==1.5.1

Download Pretrained Model

Download the pretrained model of SpeechUT

Data preparation

ST models are fine-tuned with fairseq speech-to-text task, so just follow the data preparation instructions here. To fine-tune our released models, you should use the same sentecepiece models and dictionaries as ours:

We provided examples in example.

AmbigST Dataset Construction

To finetune the model we released, you can use the dataset we provide

To construction your own Dataset, please refer to the process we provide in create_homophone and annotate_data

Fine-tune an encoder-decoder model

model_path=path/to/your/pre-trained/model
data_dir=dataset/MuSTC/en-${lang}
bash /speechut/scripts/fine_tune/en_de/all.sh $model_path $data_dir 

Please check the folder /speechut/scripts/fine_tune for detailed configuration.

Decode

You might average several model checkpoints with the best dev accuracy to stablize the performance,

python fairseq/scripts/average_checkpoints.py --inputs $model_dir/checkpoint.best_acc*.pt --output $model_dir/checkpoint.avgnbest.pt

Then decode the model with beam search,

model_path=path/to/your/fine-tuned/model
data_dir=dataset/MuSTC/en-${lang}
bash speechut/scripts/inference_st.sh $model_path $data_dir ${lang} tst-COMMON

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ and Microsoft Open Source Code of Conduct

Reference

If you find our work is useful in your research, please cite the following paper:

Contact Information

For help or issues using AmbigST models, please submit a GitHub issue.

For other communications related to AmbigST, please contact Tengfei Yu (921692739@qq.com).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published