Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

This work extends the latest ASR generative error correction (GER) benchmark to noise-robust ASR with a Robust HyPoradise dataset, and it proposes a language-space denoising approach for GER that has achieved a new breakthrough.

Conda Environment Configuration

Our code is built based on lit-gpt, please refer to official tutorial to build a conda environment. Then, please install the required packages using following command:

pip install -r requirements.txt

Code

Model code: lit_gpt/robust_ger.py;
Training script: finetune.sh;
Inference script: infer.sh;

To run the training or inference script, you need to enter the scripts (including .sh and the called .py files) and modify all the absolute paths of data, model, and experiment directory to be your own (Hint: search for "~/RobustGER"). Then, directly run the .sh script using bash command.

Models

For LLMs, please refer to tutorial for configuration steps, which support many mainstream LLMs like LLaMA-2;
For well-trained adapter checkpoints, please refer to our HuggingFace repo.

Dataset

We have released our Robust HyPoradise dataset at HuggingFace.

References

@inproceedings{hu2024large,
  title={Large Language Models are Efficient Learners of Noise-Robust Speech Recognition},
  author={Hu, Yuchen and Chen, Chen and Yang, Chao-Han Huck and Li, Ruizhe and Zhang, Chao and Chen, Pin-Yu and Chng, Eng Siong},
  booktitle={International Conference on Learning Representations},
  year={2024}
}

@inproceedings{chen2023hyporadise,
  title={HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models},
  author={Chen, Chen and Hu, Yuchen and Yang, Chao-Han Huck and Siniscalchi, Sabato Marco and Chen, Pin-Yu and Chng, Eng Siong},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
chat		chat
checkpoints		checkpoints
eval		eval
finetune		finetune
generate		generate
inference		inference
lit_gpt		lit_gpt
notebooks		notebooks
pretrain		pretrain
quantize		quantize
scripts		scripts
tests		tests
tutorials		tutorials
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
finetune.sh		finetune.sh
infer.sh		infer.sh
requirements.txt		requirements.txt
setup.py		setup.py

License

YUCHEN005/RobustGER

Folders and files

Latest commit

History

Repository files navigation

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

Conda Environment Configuration

Code

Models

Dataset

References

About

Resources

License

Stars

Watchers

Forks

Languages