Skip to content

YUCHEN005/RobustGER

Repository files navigation

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

[Paper] [Data] [Model]

This work extends the latest ASR generative error correction (GER) benchmark to noise-robust ASR with a Robust HyPoradise dataset, and it proposes a language-space denoising approach for GER that has achieved a new breakthrough.

Conda Environment Configuration

Our code is built based on lit-gpt, please refer to official tutorial to build a conda environment. Then, please install the required packages using following command:

pip install -r requirements.txt

Code

  • Model code: lit_gpt/robust_ger.py;
  • Training script: finetune.sh;
  • Inference script: infer.sh;

To run the training or inference script, you need to enter the scripts (including .sh and the called .py files) and modify all the absolute paths of data, model, and experiment directory to be your own (Hint: search for "~/RobustGER"). Then, directly run the .sh script using bash command.

Models

  • For LLMs, please refer to tutorial for configuration steps, which support many mainstream LLMs like LLaMA-2;
  • For well-trained adapter checkpoints, please refer to our HuggingFace repo.

Dataset

We have released our Robust HyPoradise dataset at HuggingFace.

References

@inproceedings{hu2024large,
  title={Large Language Models are Efficient Learners of Noise-Robust Speech Recognition},
  author={Hu, Yuchen and Chen, Chen and Yang, Chao-Han Huck and Li, Ruizhe and Zhang, Chao and Chen, Pin-Yu and Chng, Eng Siong},
  booktitle={International Conference on Learning Representations},
  year={2024}
}

@inproceedings{chen2023hyporadise,
  title={HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models},
  author={Chen, Chen and Hu, Yuchen and Yang, Chao-Han Huck and Siniscalchi, Sabato Marco and Chen, Pin-Yu and Chng, Eng Siong},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}

About

Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages