Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation
Switch branches/tags
Nothing to show
Clone or download
Latest commit eaffa35 Sep 5, 2018

README.md

Factorizable Net (F-Net)

This is pytorch implementation of our ECCV-2018 paper: Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation. This project is based on our previous work: Multi-level Scene Description Network.

Progress

  • Guide for Project Setup
  • Guide for Model Evaluation with pretrained model
  • Guide for Model Training
  • Uploading pretrained model and format-compatible datasets.
  • Update the Model link for VG-DR-Net (We will upload a new model by Aug. 27).
  • Update the Dataset link for VG-DR-Net.
  • A demonstration of our Factorizable Net
  • Migrate to PyTorch 0.4.x (currently, 0.3.x).

Updates

  • Aug 28: Bug fix for running the evaluation with "--use_gt_boxes". VG-DR-Net has some self-relations, e.g. A-relation-A. Previously, we assumed there is no such relation. This commit may influence the model performance on Scene Graph Generation.

Project Settings

  1. Install the requirements (you can use pip or Anaconda):

    conda install pip pyyaml sympy h5py cython numpy scipy
    conda install -c menpo opencv3
    conda install -c soumith pytorch torchvision cuda80 
    pip install easydict
    
  2. Clone the Factorizable Net repository

    git clone git@github.com:yikang-li/FactorizableNet.git
  3. Build the Cython modules for nms, roi pooling,roi align modules

    cd lib
    ./make.sh
    cd ..
  4. Download the three datasets VG-MSDN, VG-DR-Net, VRD to F-Net/data. And extract the folders with tar xzvf ${Dataset}.tgz. We have convert to original annotations to json version.

  5. Download Visual Genome images and VRD images.

  6. Link the image data folder to target folder: ln -s /path/to/images F-Net/data/${Dataset}/images

    • p.s. You can change the default data directory by modifying dir in options/data_xxx.json.
  7. [optional] Download the pretrained RPN for Visual Genome and VRD. Place them into output/.

  8. [optional] Download the pretrained Factorizable Net on VG-MSDN, VG-DR-Net and VG-VRD, and place them to output/trained_models/

Project Organization

There are several subfolders contained:

  • lib: dataset Loader, NMS, ROI-Pooling, evaluation metrics, etc. are listed in the folder.
  • options: configurations for Data, RPN, F-Net and hyperparameters.
  • models: model definations for RPN, Factorizable and related modules.
  • data: containing VG-DR-Net (svg/), VG-MSDN (visual_genome/) and VRD (VRD/).
  • output: storing the trained model, checkpoints and loggers.

Evaluation with our Pretrained Models

Pretrained models on VG-MSDN, VG-DR-Net and VG-VRD are provided. --evaluate is provided to enable evaluation mode. Additionally, we also provide --use_gt_boxes to fed the ground-truth object bounding boxes instead of RPN proposals.

  • Evaluation on VG-MSDN with pretrained. Scene Graph Generation results: Recall@50: 12.984%, Recall@100: 16.506%.
CUDA_VISIBLE_DEVICES=0 python train_FN.py --evaluate --dataset_option=normal \
	--path_opt options/models/VG-MSDN.yaml \
	--pretrained_model output/trained_models/Model-VG-MSDN.h5
  • Evaluation on VG-VRD with pretrained. : Scene Graph Generation results: Recall@50: 19.453%, Recall@100: 24.640%.
CUDA_VISIBLE_DEVICES=0 python train_FN.py --evaluate \
	--path_opt options/models/VRD.yaml \
	--pretrained_model output/trained_models/Model-VRD.h5
  • Evaluation on VG-DR-Net with pretrained. Scene Graph Generation results: Recall@50: 19.807%, Recall@100: 25.488%.
CUDA_VISIBLE_DEVICES=0 python train_FN.py --evaluate --dataset_option=normal \
	--path_opt options/models/VG-DR-Net.yaml \
	--pretrained_model output/trained_models/Model-VG-DR-Net.h5

Training

  • Training Region Proposal Network (RPN). The shared conv layers are fixed. We also provide pretrained RPN on Visual Genome and VRD.

     # Train RPN for VG-MSDN and VG-DR-Net
     CUDA_VISIBLE_DEVICES=0 python train_rpn.py --dataset_option=normal 
     
     # Train RPN for VRD
     CUDA_VISIBLE_DEVICES=0 python train_rpn_VRD.py 
     
    
  • Training Factorizable Net: detailed training options are included in options/models/.

     # Train F-Net on VG-MSDN:
     CUDA_VISIBLE_DEVICES=0 python train_FN.py --dataset_option=normal \
     	--path_opt options/models/VG-MSDN.yaml --rpn output/RPN.h5
     	
     # Train F-Net on VRD:
     CUDA_VISIBLE_DEVICES=0 python train_FN.py  \
     	--path_opt options/models/VRD.yaml --rpn output/RPN_VRD.h5
     	
     # Train F-Net on VG-DR-Net:
     CUDA_VISIBLE_DEVICES=0 python train_FN.py --dataset_option=normal \
     	--path_opt options/models/VG-DR-Net.yaml --rpn output/RPN.h5
     
    

    --rpn xxx.h5 can be ignore for end-to-end training from pretrained VGG16. Sometime, unexpected and confusing errors appear. Ignore it and restart to training.

  • For better results, we usually re-train the model with additional epochs by resuming the training from the checkpoint with --resume ckpt:

     # Resume F-Net training on VG-MSDN:
     CUDA_VISIBLE_DEVICES=0 python train_FN.py --dataset_option=normal \
     	--path_opt options/models/VG-MSDN.yaml --resume ckpt --epochs 30
    

Acknowledgement

We thank longcw for his generous release of the PyTorch Implementation of Faster R-CNN.

Reference

If you find our project helpful, your citations are highly appreciated:

@inproceedings{li2018fnet,
author={Li, Yikang and Ouyang, Wanli and Bolei, Zhou and Jianping, Shi and Chao, Zhang and Wang, Xiaogang},
title={Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation},
booktitle = {ECCV},
year = {2018}
}

We also have two papers regarding to scene graph generation / visual relationship detection:

@inproceedings{li2017msdn,
author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={Scene graph generation from objects, phrases and region captions},
booktitle = {ICCV},
year = {2017}
}

@inproceedings{li2017vip,
author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={ViP-CNN: Visual Phrase Guided Convolutional Neural Network},
booktitle = {CVPR},
year = {2017}
}

License:

The pre-trained models and the Factorizable Network technique are released for uncommercial use.

Contact Yikang LI if you have questions.