Skip to content

🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)

License

Notifications You must be signed in to change notification settings

zszheng147/Spatial-AST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatial-AST

This repo hosts the code and models of "BAT: Learning to Reason about Spatial Sounds with Large Language Models" [Accepted by ICML 2024 bib].

Installation

conda env create -f environment.yml
bash timm_patch/patch.sh

Data Preparation

Please visit dataset and download respectively.

Inference

We provide pretrained checkpoint. You can do inference basically by

# remember to replace `ckpt` variable with your local path
bash scripts/inf.sh

Train a new model

Training from scratch is pretty simple and easy.

bash scripts/finetune-2m.sh

TODO

The TODOs left will be completed before the end of May 2024.

  • Environment setup
  • Upload pretrained weights
  • Fix numba output bug
  • Upload training data: SpatialSoundQA
  • Replace tensorboard with W&B
  • Inference colab

Citation

@article{zheng2024bat,
  author    = {Zheng, Zhisheng and Peng, Puyuan and Ma, Ziyang and Chen, Xie and Choi, Eunsol and Harwath, David},
  title     = {BAT: Learning to Reason about Spatial Sounds with Large Language Models},
  journal   = {arXiv preprint arXiv:2402.01591},
  year      = {2024},
}

Reference

The codebase is based on the Audio-MAE repo.

License

This project is under the CC-BY 4.0 license. See LICENSE for details.

About

🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published