Skip to content

yxf0605/3m-asr

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3M-ASR for End-to-End Speech Recognition

This project is used to build an End-to-End Speech Recognition system based on Mixture-of-Experts(MoE) model. MoE is an efficient way to train a large scale model and we have proved its efficiency on public dataset. More details about the algorithm can be found in "3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition".

Installation

  • Clone this repo
git clone https://github.com/tencent-ailab/3m-asr.git
conda create -n moe python=3.8
conda activate moe
pip install -r requirements.txt
conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
  • Follow the instruction under directory fastmoe to install fastmoe

Performance Benchmark

We evaluate our system on the public WenetSpeech dataset and the recipe of Conformer-MoE is provided. CER results are listed below and the first three lines are provided by WenetSpeech

Toolkit Dev Test_net Test_Meeting AIShell-1
Kaldi 9.07 12.83 24.72 5.41
Espnet 9.70 8.90 15.90 3.90
WeNet 8.88 9.70 15.59 4.61
Conformer-MoE(32e) 7.49 7.99 13.69 4.03

Acknowledge

  • We used FastMoE to support Mixture-of-Experts model training in Pytorch
  • We borrowed a lot of code from WeNet for the implementation of Conformer and data processing

Reference

[1] SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts(InterSpeech 2021)

[2] 3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition(Submitted to InterSpeech 2022)

Citation

@inproceedings{you21_interspeech,
  author={Zhao You and Shulin Feng and Dan Su and Dong Yu},
  title={{SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2077--2081},
  doi={10.21437/Interspeech.2021-478}
}

@article{you20223m,
  title={3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition},
  author={You, Zhao and Feng, Shulin and Su, Dan and Yu, Dong},
  journal={arXiv preprint arXiv:2204.03178},
  year={2022}
}

Contact

If you have any questions about this project, please feel free to contact shulinfeng@tencent.com or dennisyou@tencent.com

Disclaimer

This is not an officially supported Tencent product

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.1%
  • C++ 7.3%
  • Cuda 5.4%
  • C 1.4%
  • Other 1.8%