This repo hosts the code and models of "BAT: Learning to Reason about Spatial Sounds with Large Language Models" [Accepted by ICML 2024 bib].
conda env create -f environment.yml
bash timm_patch/patch.sh
Please visit dataset and download respectively.
We provide pretrained checkpoint. You can do inference basically by
# remember to replace `ckpt` variable with your local path
bash scripts/inf.sh
Training from scratch is pretty simple and easy.
bash scripts/finetune-2m.sh
The TODOs left will be completed before the end of May 2024.
- Environment setup
- Upload pretrained weights
- Fix numba output bug
- Upload training data: SpatialSoundQA
- Replace tensorboard with W&B
- Inference colab
@article{zheng2024bat,
author = {Zheng, Zhisheng and Peng, Puyuan and Ma, Ziyang and Chen, Xie and Choi, Eunsol and Harwath, David},
title = {BAT: Learning to Reason about Spatial Sounds with Large Language Models},
journal = {arXiv preprint arXiv:2402.01591},
year = {2024},
}
The codebase is based on the Audio-MAE repo.
This project is under the CC-BY 4.0 license. See LICENSE for details.