TG-GAT for AVDN.

Official Code for the 1st Place Solution of ICCV CLVL 2023 AVDN Challenge.

The corresponding technical report is Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation.

You can check the specific details of the AVDN competition on the official website, and the leaderboard results can be viewed at evalai.

Abstract

This report details the method of the winning entry of the AVDN Challenge in ICCV CLVL 2023. The competition addresses the Aerial Navigation from Dialog History (ANDH) task, which requires a drone agent to associate dialog history with aerial observations to reach the destination. For better cross-modal grounding abilities of the drone agent, we propose a Target-Grounded Graph-Aware Transformer (TG-GAT) framework. Concretely, TG-GAT first leverages a graph-aware transformer to capture spatiotemporal dependency, which benefits navigation state tracking and robust action planning. In addition, an auxiliary visual grounding task is devised to boost the agent’s awareness of referred landmarks. Moreover, a hybrid augmentation strategy based on large language models is utilized to mitigate data scarcity limitations. Our TG-GA framework won the AVDN Challenge, with 2.2% and 3.0% absolute improvements over the baseline on SPL and SR metrics, respectively.

For AVDN task, we recommend referring to the original AVDN paper for more detailed information.

TODOs

LLM-augmented instruction data;
Code with support for multi-GPU training;
Trained logs;
Trained models;

Prerequisites

Installation

We utilized the same environment as the AVDN baseline.

pip install torch==1.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

Dataset Download

Please download the xView dataset from https://challenge.xviewdataset.org/data-download.
Please download the ANDH dataset from https://sites.google.com/view/aerial-vision-and-dialog/home.
Please download the converted xView-pretrained Yolov5x backbone weight from here. Or you can obtain the complete weights from https://huggingface.co/deprem-ml/Binafarktespit-yolo5x-v1-xview and personally load them into the network.
Please download our LLM-augmented instructions dataset using Vicuna-33b from here. Alternatively, you can use FastChat to personally enhance training instructions.

After downloading the aforementioned data, please organize the entire project according to the following structure:

project_root/
├── datasets/
│   ├── AVDN/
│   │   ├── annotations(ANDH dataset here)
│   │   ├── pretrain_weights(pretrained weights here)
│   │	├── train_iamges(xView images here)
│	│	│── ...
│
├── readme_imgs/
│   ├── ...
│
├── src/
│   ├── prj_1/
│   │   ├── ...
│   │
│   ├── prj_2/
│   │   ├── ...
│
├── README.md
├── requirements.txt

Training logs

We present training logs corresponding to the 4 models under the 'src' directory in our implementation.

Model	SPL	SR	GP	logs
imga	16.9	20.2	51.9	logs.txt
imga+langa	18.2	20.9	58.2	logs.txt
imga+mhca+gr+roberta+yolov5x	18.8	23.3	54.3	logs.txt
img+mhca+gr+roberta+yolov5x+gat	18.4	22.6	58.1	logs.txt

Running

To train and validate the model, you need to navigate to the specific model directory within the 'src' folder. Taking the 'et_imga' model as an example, you can run the following command for running.

cd src/0_et_imga
bash train.sh

Acknowledgment

We thank the developers of AVDN and DUET for their public code release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

readme_imgs

readme_imgs

src

src

tg-gat-logs

tg-gat-logs

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

TG-GAT for AVDN.

Abstract

TODOs

Prerequisites

Installation

Dataset Download

Training logs

Running

Acknowledgment

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
datasets		datasets
readme_imgs		readme_imgs
src		src
tg-gat-logs		tg-gat-logs
README.md		README.md
requirements.txt		requirements.txt

yifeisu/TG-GAT

Folders and files

Latest commit

History

Repository files navigation

TG-GAT for AVDN.

Abstract

TODOs

Prerequisites

Installation

Dataset Download

Training logs

Running

Acknowledgment

About

Resources

Stars

Watchers

Forks

Languages