Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation

This is an implementation of paper "Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation" (read the paper here).

👀 Overview

We propose AmbigST to mitigate speech sense ambiguity in speech translation.

Result on En-to-XX

Result on XX-to-En

Download Trained Models

The models are trained based on pytorch.

Language Pair	Download Link
AmbigST En-De	Download
AmbigST En-Fe	Download
AmbigST En-Es	Download
AmbigST Fr-En	Download
AmbigST Es-En	Download
AmbigST De-En	Download

⚙️ Setup

git submodule update --init SpeechUT/fairseq
cd SpeechUT/
pip install --editable fairseq/
pip install sacrebleu==1.5.1

Download Pretrained Model

Download the pretrained model of SpeechUT

Data preparation

ST models are fine-tuned with fairseq speech-to-text task, so just follow the data preparation instructions here. To fine-tune our released models, you should use the same sentecepiece models and dictionaries as ours:

We provided examples in example.

AmbigST Dataset Construction

To finetune the model we released, you can use the dataset we provide

En-De: Train, Dev
En-Es: Train, Dev
En-Fr: Train, Dev
En-De: Train, Dev
En-Es: Train, Dev
En-Fr: Train, Dev

To construction your own Dataset, please refer to the process we provide in create_homophone and annotate_data

Fine-tune an encoder-decoder model

model_path=path/to/your/pre-trained/model
data_dir=dataset/MuSTC/en-${lang}
bash /speechut/scripts/fine_tune/en_de/all.sh $model_path $data_dir

Please check the folder /speechut/scripts/fine_tune for detailed configuration.

Decode

You might average several model checkpoints with the best dev accuracy to stablize the performance,

python fairseq/scripts/average_checkpoints.py --inputs $model_dir/checkpoint.best_acc*.pt --output $model_dir/checkpoint.avgnbest.pt

Then decode the model with beam search,

model_path=path/to/your/fine-tuned/model
data_dir=dataset/MuSTC/en-${lang}
bash speechut/scripts/inference_st.sh $model_path $data_dir ${lang} tst-COMMON

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ and Microsoft Open Source Code of Conduct

Reference

If you find our work is useful in your research, please cite the following paper:

Contact Information

For help or issues using AmbigST models, please submit a GitHub issue.

For other communications related to AmbigST, please contact Tengfei Yu (921692739@qq.com).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Figure		Figure
create_data		create_data
data		data
fairseq		fairseq
speechut		speechut
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation

👀 Overview

Result on En-to-XX

Result on XX-to-En

Download Trained Models

⚙️ Setup

Download Pretrained Model

Data preparation

AmbigST Dataset Construction

Fine-tune an encoder-decoder model

Decode

License

Reference

Contact Information

About

Releases

Packages

Languages

License

ytf-philp/AmbigST

Folders and files

Latest commit

History

Repository files navigation

Speech Sense Disambiguation: Tackling Homophone Ambiguity in End-to-End Speech Translation

👀 Overview

Result on En-to-XX

Result on XX-to-En

Download Trained Models

⚙️ Setup

Download Pretrained Model

Data preparation

AmbigST Dataset Construction

Fine-tune an encoder-decoder model

Decode

License

Reference

Contact Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages