This repo is official code implementation for our NeurIPS 2023 paper "Towards Free Data Selection with General-Purpose Models". Currently, this repository only contains the implementation for the old version paper. We will update the new version of paper and code soon.
- [2023-9-21] 🔥Our paper is accepted by NeurIPS 2023!🔥
- [2022-3-20] An updated version of GEAL is released with better efficiency, stability, and performance.
- [2022-2-23] Code for object detection task is available.
Existing active learning work follows a cumbersome pipeline by repeating the time-consuming model training and batch data selection multiple times on each dataset separately. We challenge this status-quo by proposing a novel general and efficient active learning (GEAL) method in this paper. Utilizing a publicly available model pre-trained on a large dataset, our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
Our method is significantly more efficient than prior arts by hundreds of times, while the performance is competitive or even better than methods following the traditional pipeline.
This codebase has been developed with CUDA 11.2, python 3.7, PyTorch version 1.7.1, and torchvision 0.8.2. Please install PyTorch according to the instruction on the official website, and run the following command to install other necessary modules.
pip install -r torchextractor==0.3.0 pillow
You also need to install kmeans_pytorch from the source code. You would get a wrong version if directly installing it through pypi.
git clone https://github.com/subhadarship/kmeans_pytorch
cd kmeans_pytorch
pip install --editable .
Please follow the steps in our instruction for data selection.
- Object Detection: Please follow the steps in our instruction for object detection downstream task.
- Image Classification: Please follow the steps in our instruction for image classification downstream task.
- The K-Means module comes from kmeans_pytorch.
- The transformer backbone follows dino.
- The downstream object detection task code comes from mmdetection.
- The downstream image classification task code comes from mmclassification.
We sincerely thank the authors for their excellent work!
If you find our research helpful, please consider cite it as:
@inproceedings{
xie2023towards,
title={Towards Free Data Selection with General-Purpose Models},
author={Xie, Yichen and Ding, Mingyu and Tomizuka, Masayoshi and Zhan, Wei},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=KBXcDAaZE7}
}