This is the implementation of CVPR 2019 paper "Large Scale Incremental Learning". If the paper and code helps you, we would appreciate your kindly citations of our paper.
@inproceedings{wu2019large,
title={Large Scale Incremental Learning},
author={Wu, Yue and Chen, Yinpeng and Wang, Lijuan and Ye, Yuancheng and Liu, Zicheng and Guo, Yandong and Fu, Yun},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={374--382},
year={2019}
}
In this paper, we proposed a new method to address the imbalance issue in incremental learning, which is critical when the number of classes becomes large. Firstly, we validated our hypothesis that the classifier layer (the last fully connected layer) has a strong bias towards the new classes, which has substantially more training data than the old classes. Secondly, we found that this bias can be effectively corrected by applying a linear model with a small validation set. Our method has excellent results on two large datasets with 1,000+ classes (ImageNet ILSVRC 2012 and MS-Celeb-1M), outperforming the state-of-the-art by a large margin (11.1% on ImageNet ILSVRC 2012 and 13.2% on MS-Celeb-1M).
Words before the code: most codes and experiments are finished in late 2017 and earlier 2018. It is hard to retrieve exact the same environment for experiments, which I remember that the system was in Ubuntu 14. CUDA and tensorflow are all with earlier versions. I re-installed the system several times last year (2018) because some conficts in setting up environment for pytorch, which was original fit for caffe and tensorflow. And also, I upgraded the system form Ubuntu 14 to Ubuntu 16.
The resnet implementation is the official tensorflow official models at:
https://github.com/tensorflow/models
In the latest repo, the most similar implementaion is:
https://github.com/tensorflow/models/blob/master/official/r1/resnet/imagenet_main.py
Unfortunately, we were not able to run our code with the latest tensorflow-2.0 or tensorflow-1.14.
We understand that how important it is to reproduce the results of published papers. I find a compatible tensorflow-1.5 version that is able to run the code with my current CUDA settings. To not mess up with my current software enviroment for pytorch mostly, I use the virtualenv to set up a seperate environment for experiments. The dependency of the code is lite and should be able work in most cases if you get the tensorflow version set up correctly. I summarize my current environment for reference.
System Information
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
CUDA version: 9.0
CUDNN version: 7.0.5
GPU: TITAN X (Pascal) 12GB
Python version: 3.5.2
Environemnt is setup using virtualenv.
virtualenv venv
pip install scipy
pip install tensorflow-gpu==1.5
To activate the environment:
source venv/bin/activate
Dataset:
We mainly clean the code for ImageNet dataset and leave the other two datasets (CIFAR and MS-Celeb-1M) in future work.
Data preparation:
We have put essential files in the repo so that what you need to do is to download the ImageNet-1000 images.
For the images, we used the ILSVRC2016_CLS-LOC.tar.gz file, which should be kept unchanged since 2012.
Some links to handle the 50K validation images to split validation images in class folders.
https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset
After downloading and processing, the structure in the data forlder (dataImageNet100 and dataImageNet1000) should be like:
|--train
| |--n01440764
| |--n01443537
| |.......
|--train.txt
|--val
| |--n01440764
| |--n01443537
| |.......
|--val.txt
Experiments run command:
CUDA_VISIBLE_DEVICES=0 python imagenet_main.py 1>log 2>&1
We run the code once and report the result here. Top-5 accuracy is reported on ImageNet dataset. We observe similar results on ImageNet-100 form this repo with new environment and what we reported in paper. On ImageNet-1000, we notice some performance drop in several first increments but the final results are even better than what we reported in our paper, which might be caused by different operation systems and software environments.
Training on ImageNet-100 takes around 15 hours. Results are:
10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | |
---|---|---|---|---|---|---|---|---|---|---|
In paper (%) | 98.40 | 96.20 | 94.00 | 92.90 | 91.10 | 89.40 | 88.10 | 86.50 | 85.40 | 84.40 |
This Repo (%) | 98.79 | 96.10 | 95.06 | 93.50 | 90.96 | 89.73 | 89.02 | 87.22 | 85.97 | 84.24 |
Before Bias Correction (%) | - | 95.70 | 93.06 | 90.75 | 86.60 | 86.06 | 83.60 | 81.25 | 78.28 | 76.32 |
\beta | 1.0 | 0.5900 | 0.5140 | 0.4742 | 0.4839 | 0.4648 | 0.4389 | 0.4297 | 0.4335 | 0.3941 |
\gamma | 0.0 | -0.4416 | -0.5401 | -0.4701 | -0.5323 | -0.4672 | -0.4830 | -0.5349 | -0.5804 | -0.4609 |
Training Samples | 12800 | 14417 | 14600 | 14600 | 14600 | 14441 | 14600 | 14620 | 14331 | 14566 |
Val Samples | 200 | 400 | 300 | 240 | 250 | 240 | 210 | 160 | 180 | 200 |
Test Samples | 500 | 1000 | 1500 | 2000 | 2500 | 3000 | 3500 | 4000 | 4500 | 5000 |
Training on ImageNet-100 takes around 100 hours. Results are:
100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 | |
---|---|---|---|---|---|---|---|---|---|---|
In paper (%) | 94.10 | 92.50 | 89.60 | 89.10 | 85.70 | 83.20 | 80.20 | 77.50 | 75.00 | 73.20 |
This Repo (%) | 93.72 | 91.46 | 88.70 | 86.63 | 84.64 | 83.08 | 81.37 | 79.82 | 78.22 | 76.76 |
Before Bias Correction (%) | - | 89.64 | 84.50 | 80.84 | 78.07 | 74.89 | 72.66 | 70.38 | 67.93 | 63.34 |
\beta | 1.0 | 0.7873 | 0.7382 | 0.7053 | 0.6884 | 0.6704 | 0.6609 | 0.6515 | 0.6239 | 0.6334 |
\gamma | 0.0 | -0.5586 | -0.5759 | -0.5015 | -0.5505 | -0.5137 | -0.4677 | -0.4471 | -0.4118 | -0.4064 |
Training Samples | 126856 | 144159 | 144505 | 144301 | 143776 | 145238 | 143391 | 144418 | 143277 | 143046 |
Val Samples | 2000 | 4000 | 3000 | 2400 | 2500 | 2400 | 2100 | 1600 | 1800 | 2000 |
Test Samples | 5000 | 10000 | 15000 | 20000 | 25000 | 30000 | 35000 | 40000 | 45000 | 50000 |
Results are from one run of the model on ImageNet-100 and ImageNet-1000. Log files are located at
./logs/log-ImageNet100
./logs/log-ImageNet1000
Class Order:
To keep the same order with iCaRL (https://github.com/srebuffi/iCaRL), we use the same random seed (1993) from numpy to generate the order.
Distilling Loss:
We store the previous network for distilling loss.
Bias Correction:
After learning the Bias Correction parameters (\beta and \gamma), classifier after correction is used for the distilling loss in the next incremental training.
Validation Samples from exemplars:
10% selection is limited on exemplars (old classes). Samples from new classes will match the same number of validation samples.
Awesome-Incremental-Learning: https://github.com/xialeiliu/Awesome-Incremental-Learning
If you found any issue of the code, please contact Yue Wu (@wuyuebupt, Email: yuewu@ece.neu.edu or wuyuebupt@gmail.com)