Skip to content
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)
Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
dataset main Aug 21, 2019
model main Aug 14, 2019
saved_models download script Aug 14, 2019
utils General restructuring Aug 9, 2019
LICENSE add license Sep 19, 2019 readme Aug 21, 2019 main Sep 25, 2019

One-Stage Visual Grounding

A Fast and Accurate One-Stage Approach to Visual Grounding

by Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, and Jiebo Luo

IEEE International Conference on Computer Vision (ICCV), 2019, Oral


We propose a simple, fast, and accurate one-stage approach to visual grounding. For more details, please refer to our paper.


  title={A Fast and Accurate One-Stage Approach to Visual Grounding},
  author={Yang, Zhengyuan and Gong, Boqing and Wang, Liwei and Huang
    , Wenbing and Yu, Dong and Luo, Jiebo},


  • Python 3.5 (3.6 tested)
  • Pytorch 0.4.1
  • Others (Pytorch-Bert, OpenCV, Matplotlib, scipy, etc.)


  1. Clone the repository

    git clone
  2. Prepare the submodules and associated data

  • RefCOCO & ReferItGame Dataset: place the data or the soft link of dataset folder under ./ln_data/. We follow dataset structure DMS. To accomplish this, the bash script from DMS can be used.
    bash ln_data/ --path ./ln_data
  • Flickr30K Entities Dataset: please download the images for the dataset on the website for the Flickr30K Entities Dataset and the original Flickr30k Dataset. Images should be placed under ./ln_data/Flickr30k/flickr30k_images.

  • Data index: download the generated index files and place them as the ./data folder. Availble at [Gdrive], [One Drive].

    rm -r data
    tar xf data.tar
  • Model weights: download the pretrained model of Yolov3 and place the file in ./saved_models.

    sh saved_models/

More pretrained models are availble in the performance table [Gdrive], [One Drive] and should also be placed in ./saved_models.


  1. Train the model, run the code under main folder. Using flag --lstm to access lstm encoder, Bert is used as the default. Using flag --light to access the light model.

    python --data_root ./ln_data/ --dataset referit \
      --gpu gpu_id --batch_size 32 --resume saved_models/lstm_referit_model.pth.tar \
      --lr 1e-4 --nb_epoch 100 --lstm
  2. Evaluate the model, run the code under main folder. Using flag --test to access test mode.

    python --data_root ./ln_data/ --dataset referit \
      --gpu gpu_id --resume saved_models/lstm_referit_model.pth.tar \
      --lstm --test
  3. Visulizations. Flag --save_plot will save visulizations.

Performance and Pre-trained Models

Please check the detailed experiment settings in our paper.

Dataset Ours-LSTM Performance (Accu@0.5) Ours-Bert Performance (Accu@0.5)
ReferItGame Gdrive 58.76 Gdrive 59.30
Flickr30K Entities One Drive 67.62 One Drive 68.69
RefCOCO val: 73.66 val: 72.05
testA: 75.78 testA: 74.81
testB: 71.32 testB: 67.59


Part of the code or models are from DMS, MAttNet, Yolov3 and Pytorch-yolov3.

You can’t perform that action at this time.