Skip to content

tsungyu/o2dp

Repository files navigation

Second-order Democratic Aggregation

Created by Tsung-Yu Lin, Subhransu Maji and Piotr Koniusz.

Introduction

This repository contains the code for reproducing the results in our ECCV 2018 paper:

@inproceedings{lin2018o2dp,
    Author = {Tsung-Yu Lin and Subhransu Maji and Piotr Koniusz},
    Title = {Second-order Democratic Aggregation},
    Booktitle = {European Conference on Computer Vision (ECCV)},
    Year = {2018}
}

The paper analyzes various feature aggregators in the context of second-order features and proposes γ-democratic pooling which generalizes sum pooling and democratic aggregation. See the project page and the paper for the detail. The code is tested on Ubuntu 14.04 using NVIDIA Titan X GPU and MATLAB R2016a.

Prerequisite

  1. MatConvNet: Our code was developed on the MatConvNet version 1.0-beta24.
  2. VLFEAT
  3. bcnn-package: The package includes our implementation of customized layers.

The packages are set up as the git submodules. Check them out by the following commands and follow the instructions on MatConvNet and VLFEAT project pages to install them.

>> git submodule init
>> git submodule update

Datasets

To run the experiments, download the following datasets and edit the model_setup.m file to point them to the dataset locations. For instance, you can point to the birds dataset directory by setting opts.cubDir = 'data/cub'.

Fine-grained classification datasets:

Texture and indoor scene datasets:

Pre-trained models

  • ImageNet LSVRC 2012 pre-trained models: The vgg-verydeep-16 and reset-101 ImageNet pre-trained models are used as our basic models. Download them from MatConvNet pre-trained models page.
  • B-CNN fine-tuned models: We also provide the B-CNN fine-tuned models with vgg-verydeep-16 from which we can extract the CNN features and aggregate them to construct the image descriptor. Download the models for CUB Birds, FGVC Aircrafts, or Stanford Cars to reproduce the accuracy provided in the paper.

Testing the models:

Solving the coefficients for γ-democratic aggregation involves sinkhorn iteration. The hyperparameters for the sinkhorn iteration are configurable in the entry codes run_experiments_o2dp.m and run_experiments_sketcho2dp_resnet.m. See the comment in the code for the detail.

  • Second-order γ-democratic aggregation: Point the variable model_path to the location of the model in run_experiments_o2dp.m and run the command run_experiments_o2dp(dataset, gamma, gpuidx) in matlab terminal.

    • For example:
    % gamma is the hyper-parameter gamma for gamma-democratic aggregation
    % gpuidx is the index of gpu on which you run the experiment
    run_experiments_o2dp('mit_indoor', 0.3, 1) 
    • Classification results: Sum and democratic aggregation can be achieved by setting the proper values of γ. The optimal γ values are indicated in the parenthesis. In general γ=0.5 performs reasonably well. For DTD and FMD these numbers are reported on the first split. For the fine-grained recognition datasets (†) the results are obtained by using the fine-tuned B-CNN models while for the texture and indoor scene datasets the ImageNet pre-trained vgg-verydeep-16 model is used.

      Dataset Sum(γ=1)    Democratic(γ=0)    γ-democratic
      Caltech UCSD Birds † 84.0 84.7 84.9 (0.5)
      Stanford Cars † 90.6 89.7 90.8 (0.5)
      FGVC Aircrafts † 85.7 86.7 86.7 (0.0)
      DTD 71.2 72.2 72.3 (0.3)
      FMD 84.6 82.8 84.8 (0.8)
      MIT Indoor 79.5 79.6 80.4 (0.3)
  • Second-order γ-democratic aggregation in sketch space: Point the variable model_path to the location of the model in run_experiments_sketcho2dp_resnet.m and run the command run_experiments_sketcho2dp_resnet(dataset, gamma, d, gpuidx) in matlab terminal.

    • For example:
    % gamma is the hyper-parameter gamma for gamma-democratic aggregation
    % d is the dimension for the sketch space
    % gpuidx is the index of gpu on which you run the experiment
    run_experiments_sketcho2dp_resnet('mit_indoor', 0.5, 8192, 1) 
    • The script aggregates the second-order ResNet features pre-trained on ImageNet in a 8192-dimensional sketch space with γ-democratic aggregator. With ResNet features the model achieves the following results. For DTD and FMD the accuracy is averaged over 10 splits.

      DTD    FMD    MIT Indoor
      Accuracy 76.2 ∓ 0.7 84.3 ∓ 1.5 84.3

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages