Skip to content

yining-mao/Moon-Shape-ICML-2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On the nonlinear correlation of ML performance between data subpopulations(ICML 2023)

Website shields.io MIT license OpenReview Python 3.6 Pytorch Made with Jupyter

This repo provides the PyTorch source code of our paper:

On the nonlinear correlation of ML performance between data subpopulations
Weixin Liang*, Yining Mao*, Yongchan Kwon*, Xinyu Yang, James Zou
ICML (2023) [Arxiv]

Overview

TL;DR: We show that there is a “moon shape” correlation (parabolic uptrend curve) between the test performance on the majority subpopulation and the minority subpopulation. This nonlinear correlations hold across model architectures, training settings, datasets, and the imbalance between subpopulations.

Introduction

Subpopulation shift is a major challenge in ML: test data often have different distribution across subgroups (e.g. different types of users or patients) compared to the training data. Recent works find a strong linear relationship between ID and OOD performance on dataset reconstruction shifts; In contrast, we empirically show that they have a nonlinear correlation under subpopulation shifts.

What is the moonshape phenomenon?

The “moon shape” phenomenon is a nonlinear correlation (parabolic uptrend curve) between the test performance on the majority subpopulation and the minority subpopulation. We decompose the model’s performance into separate evaluations on the majority and minority subpopulations of the OOD test set, and evaluate under two dataset configurations. Top (a-c): Results of datasets with spurious correlation show a pronounced nonlinearity; Bottom (d-f): Results of datasets without spurious correlation exhibit more subtle nonlinearity.

Why is it not obvious?

Why the moon shape is not obvious: Mixture of models can fill in the moon shape.

The Impact of Spurious Correlation on the Moon Shape

Stronger spurious correlation creates more nonlinear performance correlation.

See our paper for details!

Get Started

Requirements

Our implementation framework is based on MXNet and AutoGluon.

  • mxnet >= 1.7.0
  • pytorch >= 1.10.1
  • torchvision >= 0.11.2
  • autogluon
  • gluoncv

Datasets

We implement 5 subpopulation shift datasets with 6 settings (2 versions for Modified-CIFAR4).

  • Spurious correlation datasets: MetaShift, Waterbirds, Modified-CIFAR4 V1
  • Rare subpopulation datasets: PACS, OfficeHome, Modified-CIFAR4 V2

Download the data

  • For MetaShift [GoogleDrive], PACS [GoogleDrive], OfficeHome [GoogleDrive], data needs to be downloaded to corresponding dataset folders in datasets/;
  • For Waterbirds, install WILDS using pip: pip install wilds and download data with code;
  • For Modified-CIFAR4, CIFAR10 dataset will first be downloaded with torchvision.

Prepare the data

  • To see the dataset samples and prepare the data, run the jupyter notebook in corresponding dataset folder in datasets/. For example, Metashift dataset preparation code is in datasets/metashift/metashift_prepare.ipynb.
  • In each dataset preparation notebook, you can change the ROOT_PATH and EXP_ROOT_PATH in the first code cell.
    • ROOT_PATH: downloaded dataset root path
    • EXP_ROOT_PATH: experiment root path, default to experiments/DATASET_NAME
  • The prepared data will be saved in EXP_ROOT_PATH/data in Pytorch Image Folder Format:
    • Training data in EXP_ROOT_PATH/data/train;
    • Validation data in EXP_ROOT_PATH/data/majority-val and EXP_ROOT_PATH/data/minority-val.

Training Process

Train 500 different ML models with varying configurations following the search space of AutoGluon. Here for each dataset, we implement with 5 model architectures, 5 learning rates, 5 batch sizes, and 4 training durations:

@ag.args( # 5 models * 5 lr * 5 batch_size * 4 epochs = 500 configurations
    model = ag.space.Categorical(
        'mobilenetv3_small', 
        'resnet18_v1b', 
        'resnet50_v1', 
        'mobilenetv3_large', 
        'resnet101_v2', 
        ),
    lr = ag.space.Categorical(0.01, 0.005, 0.001, 0.0005, 0.0001), 
    batch_size = ag.space.Categorical(8, 16, 32, 64, 128), 
    epochs = ag.space.Categorical(1, 5, 10, 25)
    )

Specify the experiment directory, and you can train the models.

For example, if you prepare and save the data in experiments/metashift/data, run:

python main.py --exp-dir experiments/metashift

and you will get the following results in experiments/metashift/result:

  • A table with evaluation results of each configuration,
  • A 'majority subpopulation accuracy vs. minority subpopulation accuracy' plot corresponding to the table.

Reference

If you found this code/work to be useful in your own research, please considering citing the following:

@inproceedings{liang2022nonlinear,
  title={On the nonlinear correlation of ML performance betweem data subpopulations},
  author={Liang, Weixin and Mao, Yining and Kwon, Yongchan and Yang, Xinyu and Zou, James},
  booktitle={ICML},
  year={2023}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published