Skip to content
PyTorch implementation of "Mnemonics Training: Multi-Class Incremental Learning without Forgetting" (CVPR2020 Oral)
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.

Mnemonics Training: Multi-Class Incremental Learning without Forgetting

LICENSE Python PyTorch

Code will be released soon.

This repository contains the PyTorch implementation for CVPR 2020 Paper "Mnemonics Training: Multi-Class Incremental Learning without Forgetting" by Yaoyao Liu, An-An Liu, Yuting Su, Bernt Schiele, and Qianru Sun.

If you have any questions on this repository or the related paper, feel free to create an issue or send me an email.
Email address: yaoyao.liu (at)


Multi-Class Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent trade-off to effectively learning new concepts without catastrophic forgetting of previous ones. To alleviate this issue, it has been proposed to keep around a few examples of the previous concepts but the effectiveness of this approach heavily depends on the representativeness of these examples. This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner. We train the framework through bilevel optimizations, i.e., model-level and exemplar-level. We conduct extensive experiments on three MCIL benchmarks, CIFAR-100, ImageNet-Subset and ImageNet, and show that using mnemonics exemplars can surpass the state-of-the-art by a large margin. Interestingly and quite intriguingly, the mnemonics exemplars tend to be on the boundaries between classes.

Figure: The t-SNE results of three exemplar methods in two phases. The original data of 5 colored classes occur in the early phase. In each colored class, deep-color points are exemplars, and light-color ones show the original data as reference of the real data distribution. Gray crosses represent other participating classes, and each cross for one class. We have two main observations. (1) Our approach results in much clearer separation in the data, than random (where exemplars are randomly sampled in the early phase) and herding (where exemplars are nearest neighbors of the mean sample in the early phase). (2) Our learned exemplars mostly locate on the boundaries between classes.


Please cite our paper if it is helpful to your work:

author = {Liu, Yaoyao and Su, Yuting and Liu, An{-}An and Schiele, Bernt and Sun, Qianru},
title = {Mnemonics Training: Multi-Class Incremental Learning without Forgetting},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
You can’t perform that action at this time.