This repo holds the code for the work presented on ICASSP 2022 [Paper]
We provide the implementation in PyTorch for the ease of use.
Install the requirements by runing the following command:
pip install -r requirements.txt
We highly appreciate @YapengTian for the shared features and code.
Two kinds of features (i.e., Visual features and Audio features) are required for experiments.
- Visual Features: You can download the VGG visual features from here.
- Audio Features: You can download the VGG-like audio features from here.
- Additional Features: You can download the features of background videos here, which are required for the experiments of the weakly-supervised setting.
After downloading the features, please place them into the data
folder. The structure of the data
folder is shown as follows:
data
|——audio_features.h5
|——audio_feature_noisy.h5
|——labels.h5
|——labels_noisy.h5
|——mil_labels.h5
|——test_order.h5
|——train_order.h5
|——val_order.h5
|——visual_feature.h5
|——visual_feature_noisy.h5
You can download the AVE dataset from the repo here.
Training
bash supv_train.sh
# The argument "--snapshot_pref" denotes the path for saving checkpoints and code.
Evaluating
bash supv_test.sh
After training, there will be a checkpoint file whose name contains the accuracy on the test set and the number of epoch.
Training
bash weak_train.sh
Evaluating
bash weak_test.sh
Please cite the following paper if you feel this repo useful to your research
@inproceedings{inproceedings,
author = {Liu, Shuo and Quan, Weize and Liu, Yuan and Yan, Dong‐Ming},
year = {2022},
month = {03},
pages = {},
title = {Bi-Directional Modality Fusion Network For Audio-Visual Event Localization},
doi = {10.1109/ICASSP43922.2022.9746280}
}