Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation （ICLR'24）

Authors: Wenxuan Zhang, Youssef Mohamed, Bernard Ghanem, Philip Torr, Adel Bibi*, Mohamed Elhoseiny* @ KAUST Vision-CAIR, Oxford TVG (* Equal Advising)

Use this repo to reproduce our results and customize your own multi-gpu continual learning algorithms.

Overview

Abstract

We propose and study a realistic Continual Learning (CL) setting where learning algorithms are granted a restricted computational budget per time step while training.
We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rate. Previous proficient CL methods perform very poorly in this challenging setting. Overfitting to the sparse labeled data and insufficient computational budget are the two main culprits for such a poor performance.
We propose a simple but highly effective baseline, DietCL, which utilizes both unlabeled and labeled data jointly. DietCL meticulously allocates computational budget for both types of data.
We validate our baseline, at scale, on several datasets, e.g., CLOC, ImageNet10K, and CGLM, under constraint budget setup. DietCL outperforms, by a large margin, all existing supervised CL algorithms as well as more recent continual semi-supervised methods. Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget and various other ablations.

Usage

Installation

conda env create -f environment.yml
conda activate dietcl

Follow this issue to fix the library import problem.

Reproduce Results

TODO:

Hyper-parameters need to be updated
Multi-node training need to be tested

ImageNet10K Results

To avoid repeatant long pre-processing time and for stable results and faster read, we suggest to pre-process the dataset once and save it as folders of symbolic links. Please follow the following steps to prepare the dataset:

download the imagenet 21k v2 dataset from the official ImageNet Website. We use the Winter 2021 release, i.e., Processed version of ImageNet21Kusing the script of "ImageNet-21K pretraining for the masses"
Run the following script to prepare the dataset. Here we use three separate files to build the general imagenet10k, the task sequence, and the task labels for flexiable usage.

python pre_process/get_unique_set.py --root21k /path/to/your/imagenet21k/folder
python pre_process/build_cl_tasks.py 
python pre_process/label_split.py

Run the following command to reproduce the results.

N_GPU=4
python main.py trainer@_global_=diet dataset@_global_=imagenet10k \
n_gpu_per_node=${N_GPU} \
data_root=/path/to/your/imagenet21k/folder

CGLM or CLOC Results

Download the CGLM dataset using the following script.

bash pre_process/download_cglm.sh

Download the CLOC dataset from the official CLOC

Run the following command to split the dataset

python pre_process/cglm|cloc.py --root /path/to/your/cglm|cloc/folder

Run the following command to reproduce the results. data_path refers to the path to the split files, and data_root refers to the path to the image files.

N_GPU=4
python main.py trainer@_global_=diet dataset@_global_=cglm|cloc \
n_gpu_per_node=${N_GPU} \
data_root=/path/to/your/cglm|cloc/folder data_path=/path/to/your/cglm|cloc/split/file/folder

Customized Multi-GPU Continual Leanring

Other continual learning methods

Supervised continual learning with experience replay

# mix the current set and buffer. Uniformly sampling from the buffer and current task 
python main.py trainer@_global_=base sampling=uniform  replay_before=True 
# Balanced sampling from buffer and the current labeled set for each batch.
python main.py trainer@_global_=base sampling=batchmix

Continual pre-training and finetuning: pre-train and finetune for each tasks

python main.py trainer@_global_=pretrain

Customize your own continual learning methods

Customize your multi-gpu continual learning with this repo.

Write your own dataset and put it in the datasets folder with the template

class YourDataset(Dataset):
     def __init__(self, args):
          pass
     def get_new_classes(self, task_id): # we need this function to adjust the classification head for the model
          pass
     def get_labeled_set(self,task):
          pass
     def get_eval_set(self,task,per_task_eval):
          # per_task_eval refers to whether we want to evaluate all tasks at once (for efficiency purpose) or evaluate each task separately.
          pass

Write your own trainer and put it in the trainers folder with the template
Write your own model and put it in the models folder

CL + DDP trouble shooting

If slurm is used, please make sure to allocate enough CPU and CPU memory.

Acknowledgement

We thank the authors of the following repositories for their great work:

Citation

If you find this work useful, please consider citing our paper:

@inproceedings{zhang2024continual,
  title={Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation},
  author={Zhang, Wenxuan and Mohamed, Youssef and Ghanem, Bernard and Torr, Philip and Bibi, Adel and Elhoseiny, Mohamed},
  booktitle={International Conference on Learning Representations},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
dataset		dataset
doc		doc
model		model
pre_process		pre_process
trainer		trainer
.gitignore		.gitignore
README.md		README.md
data.py		data.py
environment.yml		environment.yml
main.py		main.py
metric.py		metric.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation （ICLR'24）

Overview

Usage

Installation

Reproduce Results

ImageNet10K Results

CGLM or CLOC Results

Customized Multi-GPU Continual Leanring

Other continual learning methods

Customize your own continual learning methods

CL + DDP trouble shooting

Acknowledgement

Citation

About

Releases

Packages

Languages

wx-zhang/continual-learning-on-a-diet

Folders and files

Latest commit

History

Repository files navigation

Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation （ICLR'24）

Overview

Usage

Installation

Reproduce Results

ImageNet10K Results

CGLM or CLOC Results

Customized Multi-GPU Continual Leanring

Other continual learning methods

Customize your own continual learning methods

CL + DDP trouble shooting

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages