Spatial-AST

This repo hosts the code and models of "BAT: Learning to Reason about Spatial Sounds with Large Language Models" [Accepted by ICML 2024 bib].

Installation

conda env create -f environment.yml
bash timm_patch/patch.sh

Data Preparation

Please visit dataset and download respectively.

Inference

We provide pretrained checkpoint. You can do inference basically by

# remember to replace `ckpt` variable with your local path
bash scripts/inf.sh

Train a new model

Training from scratch is pretty simple and easy.

bash scripts/finetune-2m.sh

TODO

The TODOs left will be completed before the end of May 2024.

Citation

@article{zheng2024bat,
  author    = {Zheng, Zhisheng and Peng, Puyuan and Ma, Ziyang and Chen, Xie and Choi, Eunsol and Harwath, David},
  title     = {BAT: Learning to Reason about Spatial Sounds with Large Language Models},
  journal   = {arXiv preprint arXiv:2402.01591},
  year      = {2024},
}

Reference

The codebase is based on the Audio-MAE repo.

License

This project is under the CC-BY 4.0 license. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
data		data
scripts		scripts
timm_patch		timm_patch
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine_finetune.py		engine_finetune.py
environment.yml		environment.yml
main_finetune.py		main_finetune.py
spatial_ast.py		spatial_ast.py

License

zszheng147/Spatial-AST

Folders and files

Latest commit

History

Repository files navigation

Spatial-AST

Installation

Data Preparation

Inference

Train a new model

TODO

Citation

Reference

License

About

Resources

License

Stars

Watchers

Forks

Languages