On the nonlinear correlation of ML performance between data subpopulations（ICML 2023)

This repo provides the PyTorch source code of our paper:

On the nonlinear correlation of ML performance between data subpopulations
Weixin Liang*, Yining Mao*, Yongchan Kwon*, Xinyu Yang, James Zou
ICML (2023) [Arxiv]

Overview

TL;DR: We show that there is a “moon shape” correlation (parabolic uptrend curve) between the test performance on the majority subpopulation and the minority subpopulation. This nonlinear correlations hold across model architectures, training settings, datasets, and the imbalance between subpopulations.

Introduction

Subpopulation shift is a major challenge in ML: test data often have different distribution across subgroups (e.g. different types of users or patients) compared to the training data. Recent works find a strong linear relationship between ID and OOD performance on dataset reconstruction shifts; In contrast, we empirically show that they have a nonlinear correlation under subpopulation shifts.

What is the moonshape phenomenon?

The “moon shape” phenomenon is a nonlinear correlation (parabolic uptrend curve) between the test performance on the majority subpopulation and the minority subpopulation. We decompose the model’s performance into separate evaluations on the majority and minority subpopulations of the OOD test set, and evaluate under two dataset configurations. Top (a-c): Results of datasets with spurious correlation show a pronounced nonlinearity; Bottom (d-f): Results of datasets without spurious correlation exhibit more subtle nonlinearity.

Why is it not obvious?

Why the moon shape is not obvious: Mixture of models can fill in the moon shape.

The Impact of Spurious Correlation on the Moon Shape

Stronger spurious correlation creates more nonlinear performance correlation.

See our paper for details!

Get Started

Requirements

Our implementation framework is based on MXNet and AutoGluon.

mxnet >= 1.7.0
pytorch >= 1.10.1
torchvision >= 0.11.2
autogluon
gluoncv

Datasets

We implement 5 subpopulation shift datasets with 6 settings (2 versions for Modified-CIFAR4).

Spurious correlation datasets: MetaShift, Waterbirds, Modified-CIFAR4 V1
Rare subpopulation datasets: PACS, OfficeHome, Modified-CIFAR4 V2

Download the data

For MetaShift [GoogleDrive], PACS [GoogleDrive], OfficeHome [GoogleDrive], data needs to be downloaded to corresponding dataset folders in datasets/;
For Waterbirds, install WILDS using pip: pip install wilds and download data with code;
For Modified-CIFAR4, CIFAR10 dataset will first be downloaded with torchvision.

Prepare the data

To see the dataset samples and prepare the data, run the jupyter notebook in corresponding dataset folder in datasets/. For example, Metashift dataset preparation code is in datasets/metashift/metashift_prepare.ipynb.
In each dataset preparation notebook, you can change the ROOT_PATH and EXP_ROOT_PATH in the first code cell.
- ROOT_PATH: downloaded dataset root path
- EXP_ROOT_PATH: experiment root path, default to experiments/DATASET_NAME
The prepared data will be saved in EXP_ROOT_PATH/data in Pytorch Image Folder Format:
- Training data in EXP_ROOT_PATH/data/train;
- Validation data in EXP_ROOT_PATH/data/majority-val and EXP_ROOT_PATH/data/minority-val.

Training Process

Train 500 different ML models with varying configurations following the search space of AutoGluon. Here for each dataset, we implement with 5 model architectures, 5 learning rates, 5 batch sizes, and 4 training durations:

@ag.args( # 5 models * 5 lr * 5 batch_size * 4 epochs = 500 configurations
    model = ag.space.Categorical(
        'mobilenetv3_small', 
        'resnet18_v1b', 
        'resnet50_v1', 
        'mobilenetv3_large', 
        'resnet101_v2', 
        ),
    lr = ag.space.Categorical(0.01, 0.005, 0.001, 0.0005, 0.0001), 
    batch_size = ag.space.Categorical(8, 16, 32, 64, 128), 
    epochs = ag.space.Categorical(1, 5, 10, 25)
    )

Specify the experiment directory, and you can train the models.

For example, if you prepare and save the data in experiments/metashift/data, run:

python main.py --exp-dir experiments/metashift

and you will get the following results in experiments/metashift/result:

A table with evaluation results of each configuration,
A 'majority subpopulation accuracy vs. minority subpopulation accuracy' plot corresponding to the table.

Reference

If you found this code/work to be useful in your own research, please considering citing the following:

@inproceedings{liang2022nonlinear,
  title={On the nonlinear correlation of ML performance betweem data subpopulations},
  author={Liang, Weixin and Mao, Yining and Kwon, Yongchan and Yang, Xinyu and Zou, James},
  booktitle={ICML},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
datasets		datasets
figures		figures
README.md		README.md
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

figures

figures

README.md

README.md

main.py

main.py

utils.py

utils.py

Repository files navigation

On the nonlinear correlation of ML performance between data subpopulations（ICML 2023)

Overview

Introduction

What is the moonshape phenomenon?

Why is it not obvious?

The Impact of Spurious Correlation on the Moon Shape

Get Started

Requirements

Datasets

Download the data

Prepare the data

Training Process

Reference

About

Releases

Packages

Languages

yining-mao/Moon-Shape-ICML-2023

Folders and files

Latest commit

History

Repository files navigation

On the nonlinear correlation of ML performance between data subpopulations（ICML 2023)

Overview

Introduction

What is the moonshape phenomenon?

Why is it not obvious?

The Impact of Spurious Correlation on the Moon Shape

Get Started

Requirements

Datasets

Download the data

Prepare the data

Training Process

Reference

About

Resources

Stars

Watchers

Forks

Languages