Skip to content

Author's reference implementation for PG 2019 paper "Interactive Curation of Datasets for Training and Refining Generative Models".

Notifications You must be signed in to change notification settings


Repository files navigation

Interactive Curation of Datasets for Training and Refining Generative Models

The main contributors of this repository include Wenjie Ye, Yue Dong and Pieter Peers.


This repository provides a reference implementation for the PG 2019 paper "Interactive Curation of Datasets for Training and Refining Generative Models".

More information (including a copy of the paper) can be found at


If you use our code, please consider citing:

 author = {Ye, Wenjie and Dong, Yue and Peers, Pieter},
 title = {Interactive Curation of Datasets for Training and Refining Generative Models},
 year = {2019},
 journal = {Computer Graphics Forum},
 volume = {38},
 number = {7},


System requirements

  • A Linux system with python Tensorflow-GPU environment. The code is tested with Ubuntu 16.04, CUDA 9.0, Tensorflow-GPU 1.11, Python 3.5.2, but should also support a large range of other versions.
  • Install necessary python packages if missing, such as NumPy, OpenCV-Python, Pillow.


Prepare necessary code and data.


  • Modify the "" file.

    Change the variable "outputroot" to a directory to which you want to write output files.

    Change the variable "ffhq_path" to the directory where you put all the FFHQ images (if downloaded).

  • (Conditional) If during running the pre-compiled "" works incorrect, please compile it yourself.

    g++ -std=c++11 -fPIC -shared -o expand.cpp

Run the system.

Step 1: Start the dataset curation system.

Run "" to start the system. Example:

  python -datatype face -enable_simul 1 -gpuid 0 -experiment_name curation_face


  • datatype: which data source to experiment on. Valid values are "face", "bedroom", "wood", "metal", "stone".
  • enable_simul: whether to enable simultaneous labeling and training/selecting, or to perform labeling, training, selecting sequentially.
  • gpuid: which GPU card to use. Multiple card is supported. Use comma to separate.
  • experiment_name: a name to identify the experiment. It will be used as the name of output folder.

All Arguments are optional and can be omitted.

By default, the interactive system will run on port 5001, which can be changed in "".

Wait until the system initialzation finishes. At that point, you should see "Running on (Press CTRL+C to quit)" in the console.

Note: FFHQ data source needs heavy precomputation in the first run which will take a long time.

Step 2: Interactive label the images.

Use a modern browser to visit "http://#computer#:5001/home", follow the UI to label the images.

#computer# should be substituted with the server name or IP address if the system is running on a remote server, or "localhost" if it is running locally.

After several rounds, you can stop labelling and go to the next step. After the training of each round, a user-intent classifier model will be saved in the output folder.

Step 3: Finetune the original GAN model.

For FFHQ face or bedroom (StyleGAN models), an example for running the finetuning is:

  python -datatype face -classifier_model PATH_TO_CLASSIFIER_MODEL -gpuid 0

For texture models, an example for running the finetuning is:

  python -datatype wood -classifier_model PATH_TO_CLASSIFIER_MODEL -gpuid 0

Argument "classifier_model" is the path to the user-intent classifier model obtained in step 2, which should have the form ".../model.ckpt".

The system will generate the dataset for finetuning and run the GAN finetuning. After it finishes, you will get the finetuned model and some samples in a folder created under the "outputroot" directory.

Suggestions for using the system on your own data

You will need to modify the code to run the system for a new dataset. Following is the parts that may need changes:

  • Load your own dataset into the system.
  • Use feature embedding to make the training more efficient. For example, we use FaceNet for face images, and VGG for bedroom images. If you work with natural images, you can just use the VGG embedding implementation. If you do not want an embedding, you can use the implementation for texture images.
  • Use a suitable classifier. If you use feature embedding, the classifier could be simple, such as several layers of FC. If you do not use embedding, the classifier could be a suitable ConvNet.


StyleGAN related code in this repository is provided by NVIDIA under Creative Commons Attribution-NonCommercial 4.0 International License ( We made modifications on the original code.

FaceNet related code in this repository is provided by davidsandberg under MIT License.

A part of texture GAN related code is provided by Xiao Li.

VGG related code is provided by machrisaa.

This repository is provided for non-commercial use only, without any warranty. If you use this repository, you also need to agree to the license of the preceding code providers.


You can contact Wenjie Ye ( if you have any problems.


[1] YE W., DONG Y., PEERS P.: Interactive curation of datasets for training and refining generative models. Computer Graphics Forum 38, 7 (2019).

[2] KARRAS T., LAINE S., AILA T.: A style-based generator architecture for generative adversarial networks. In CVPR (2019).

[3] SCHROFF F., KALENICHENKO D., PHILBIN J.: Facenet: A unified embedding for face recognition and clustering. In CVPR (2015), pp. 815–823.

[4] SIMONYAN K., ZISSERMAN A.: Very deep convolutional networks for large-scale image recognition. In ICLR (2015)


Author's reference implementation for PG 2019 paper "Interactive Curation of Datasets for Training and Refining Generative Models".






No releases published


No packages published