Fairseq-S2T

Adapt the fairseq toolkit for speech to text tasks.

Implementation of the paper: Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders

Key Features

Training

Support the Kaldi-style complete recipe
ASR, MT, and ST pipeline (bin)
Read training config in yaml file
CTC multi-task learning
MT training in the ST-like way (Online tokenizer) (There may be bugs)
speed perturb during pre-processing (need torchaudio ≥ 0.8.0)

Model

Conformer Architecture
Load pre-trained model for ST
Relative position encoding
Stacked acoustic-and-textual encoding

Installation

Note we only test the following environment.

Python == 3.6
torch == 1.8, torchaudio == 0.8.0, cuda == 10.2
apex

pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

nccl

make -j src.build CUDA_HOME=<path to cuda install>

gcc ≥ 4.9 (We use the version 5.4)
python library

pip install pandas sentencepiece configargparse gpustat tensorboard editdistance

Code Tree

The shell scripts for each benchmark is in the egs folder, we create the ASR pipeline for LibriSpeech, all pipelines (ASR, MT, and ST) for MuST-C. Besides, we also provide the template for other benchmarks.

Here is an example for MuST-C:

mustc
├── asr
│   ├── binary.sh
│   ├── conf
│   ├── decode.sh
│   ├── local
│   ├── run.sh
│   └── train.sh
├── mt
│   ├── binary.sh
│   ├── conf
│   ├── decode.sh
│   ├── local
│   ├── run.sh
│   └── train.sh
└── st
    ├── binary.sh
    ├── conf
    ├── decode.sh
    ├── ensemble.sh
    ├── local
    ├── run.sh
    └── train.sh

run.sh: the core script, which includes the whole processes
train.sh: call the run.sh for training
decode.sh: call the run.sh for decoding
binary.sh: generate the datasets alone
conf: the folder to save the configure files (.yaml).
local: the folder to save utils shell scripts
- monitor.sh: check the GPUS for running the program automatically
- parse_options.sh: parse the parameters for run.sh
- path.sh: no use
- utils.sh: the utils shell functions

Citations

@inproceedings{xu-etal-2021-stacked,
    title = "Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders",
    author = "Xu, Chen  and
      Hu, Bojie  and
      Li, Yanyang  and
      Zhang, Yuhao  and
      Huang, Shen  and
      Ju, Qi  and
      Xiao, Tong  and
      Zhu, Jingbo",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.204",
    doi = "10.18653/v1/2021.acl-long.204",
    pages = "2619--2630",
}

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
docs		docs
egs		egs
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Fairseq-README.md		Fairseq-README.md
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.py		setup.py
train.py		train.py

License

xuchenneu/SATE

Folders and files

Latest commit

History

Repository files navigation

Fairseq-S2T

Key Features

Training

Model

Installation

Code Tree

Citations

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages