Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Factorizable Net (F-Net)

This is pytorch implementation of our ECCV-2018 paper: Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation. This project is based on our previous work: Multi-level Scene Description Network.


  • Guide for Project Setup
  • Guide for Model Evaluation with pretrained model
  • Guide for Model Training
  • Uploading pretrained model and format-compatible datasets.
  • Update the Model link for VG-DR-Net (We will upload a new model by Aug. 27).
  • Update the Dataset link for VG-DR-Net.
  • A demonstration of our Factorizable Net
  • Migrate to PyTorch 1.0.1
  • Multi-GPU support (beta version): one image per GPU


  • Feb 26, 2019: Now we release our beta [Multi-GPU] version of Factorizable Net. Find the stable version at branch 0.3.1
  • Aug 28, 2018: Bug fix for running the evaluation with "--use_gt_boxes". VG-DR-Net has some self-relations, e.g. A-relation-A. Previously, we assumed there is no such relation. This commit may influence the model performance on Scene Graph Generation.

Project Settings

  1. Install the requirements (you can use pip or Anaconda):

    conda install pip pyyaml sympy h5py cython numpy scipy click
    conda install -c menpo opencv3
    conda install pytorch torchvision cudatoolkit=8.0 -c pytorch 
    pip install easydict
  2. Clone the Factorizable Net repository

    git clone
  3. Build the Cython modules for nms, roi pooling,roi align modules

    cd lib
    make all
    cd ..
  4. Download the three datasets VG-MSDN, VG-DR-Net, VRD to F-Net/data. And extract the folders with tar xzvf ${Dataset}.tgz. We have converted the original annotations to json version.

  5. Download Visual Genome images and VRD images.

  6. Link the image data folder to target folder: ln -s /path/to/images F-Net/data/${Dataset}/images

    • p.s. You can change the default data directory by modifying dir in options/data_xxx.json.
  7. [optional] Download the pretrained RPN for Visual Genome and VRD. Place them into output/.

  8. [optional] Download the pretrained Factorizable Net on VG-MSDN, VG-DR-Net and VG-VRD, and place them to output/trained_models/

Project Organization

There are several subfolders contained:

  • lib: dataset Loader, NMS, ROI-Pooling, evaluation metrics, etc. are listed in the folder.
  • options: configurations for Data, RPN, F-Net and hyperparameters.
  • models: model definitions for RPN, Factorizable and related modules.
  • data: containing VG-DR-Net (svg/), VG-MSDN (visual_genome/) and VRD (VRD/).
  • output: storing the trained model, checkpoints and loggers.

Evaluation with our Pretrained Models

Pretrained models on VG-MSDN, VG-DR-Net and VG-VRD are provided. --evaluate is provided to enable evaluation mode. Additionally, we also provide --use_gt_boxes to fed the ground-truth object bounding boxes instead of RPN proposals.

  • Evaluation on VG-MSDN with pretrained. Scene Graph Generation results: Recall@50: 12.984%, Recall@100: 16.506%.
CUDA_VISIBLE_DEVICES=0 python --evaluate --dataset_option=normal \
	--path_opt options/models/VG-MSDN.yaml \
	--pretrained_model output/trained_models/Model-VG-MSDN.h5
  • Evaluation on VG-VRD with pretrained. : Scene Graph Generation results: Recall@50: 19.453%, Recall@100: 24.640%.
CUDA_VISIBLE_DEVICES=0 python --evaluate \
	--path_opt options/models/VRD.yaml \
	--pretrained_model output/trained_models/Model-VRD.h5
  • Evaluation on VG-DR-Net with pretrained. Scene Graph Generation results: Recall@50: 19.807%, Recall@100: 25.488%.
CUDA_VISIBLE_DEVICES=0 python --evaluate --dataset_option=normal \
	--path_opt options/models/VG-DR-Net.yaml \
	--pretrained_model output/trained_models/Model-VG-DR-Net.h5


  • Training Region Proposal Network (RPN). The shared conv layers are fixed. We also provide pretrained RPN on Visual Genome and VRD.

     # Train RPN for VG-MSDN and VG-DR-Net
     CUDA_VISIBLE_DEVICES=0 python --dataset_option=normal 
     # Train RPN for VRD
  • Training Factorizable Net: detailed training options are included in options/models/.

     # Train F-Net on VG-MSDN:
     CUDA_VISIBLE_DEVICES=0 python --dataset_option=normal \
     	--path_opt options/models/VG-MSDN.yaml --rpn output/RPN.h5
     # Train F-Net on VRD:
     CUDA_VISIBLE_DEVICES=0 python  \
     	--path_opt options/models/VRD.yaml --rpn output/RPN_VRD.h5
     # Train F-Net on VG-DR-Net:
     CUDA_VISIBLE_DEVICES=0 python --dataset_option=normal \
     	--path_opt options/models/VG-DR-Net.yaml --rpn output/RPN.h5

    --rpn xxx.h5 can be ignored in end-to-end training from pretrained VGG16. Sometime, unexpected and confusing errors appear. Ignore it and restart to training.

  • For better results, we usually re-train the model with additional epochs by resuming the training from the checkpoint with --resume ckpt:

     # Resume F-Net training on VG-MSDN:
     CUDA_VISIBLE_DEVICES=0 python --dataset_option=normal \
     	--path_opt options/models/VG-MSDN.yaml --resume ckpt --epochs 30


We thank longcw for his generous release of the PyTorch Implementation of Faster R-CNN.


If you find our project helpful, your citations are highly appreciated:

author={Li, Yikang and Ouyang, Wanli and Bolei, Zhou and Jianping, Shi and Chao, Zhang and Wang, Xiaogang},
title={Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation},
booktitle = {ECCV},
year = {2018}

We also have two papers regarding to scene graph generation / visual relationship detection:

author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={Scene graph generation from objects, phrases and region captions},
booktitle = {ICCV},
year = {2017}

author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={ViP-CNN: Visual Phrase Guided Convolutional Neural Network},
booktitle = {CVPR},
year = {2017}


The pre-trained models and the Factorizable Network technique are released for uncommercial use.

Contact Yikang LI if you have questions.


Factorizable Net (Multi-GPU version): An Efficient Subgraph-based Framework for Scene Graph Generation






No releases published


No packages published