The goal of this project (link to the original ArXiv paper):
-
Develop a systematic framework to measure concentration for arbitrary distributions
-
Theoretically, prove that the empirical concentration with respect to special collection of subsets will converge to the actual concentration asymptotically
-
Empirically, propose algorithms for measuring concentration of benchmark image distributions under both
and
distance metrics
The code was developed using Python3 on Anaconda
- Install Pytorch 0.4.1:
conda update -n base conda && conda install pytorch=0.4.1 torchvision -c pytorch -y
- Install dependencies:
pip install --upgrade pip && pip install scipy sklearn numpy torch setproctitle
-
Example for empirically measuring concentraion under
metric:
- First, precompute the distance to the k-th nearest neighours for each training example
python preliminary.py --dataset mnist --metric infinity --k 50 - Next, run the proposed algorithm that finds a robust error region under
python main_infinity.py --dataset mnist --metric infinity --epsilon 0.3 --q 0.629 --clusters 10
- First, precompute the distance to the k-th nearest neighours for each training example
-
Example for empirically measuring concentraion under
metric:
- First, precompute the nearest neighbor indices for each training example
python preliminary.py --dataset cifar --metric euclidean --alpha 0.05 - Next, run the proposed algorithm that finds a robust error region under
python main_euclidean.py --dataset cifar --metric euclidean --epsilon 0.2453 --alpha 0.05 --clusters 5
- First, precompute the nearest neighbor indices for each training example
load_data.py: defines argparser and dataloaders for several benchmark image datasetspreliminary.py: finds the k-nearest neighbors for each example in a given training datasetmain_infinity.py: main function for emprically measuring concentration undermetric based on complement of union of hyperrectangles
main_euclidean.py: main function for emprically measuring concentration undermetric based on union of balls
tune_infinity.py: implements the tuning method (gird search for #clusters & binary search for q) for optimal concentration undermetric
tune_euclidean.py: implements the tuning method (grid search for #clusters) for optimal concentration undermetric
baseline.py: implements the baseline method that heuristically estimates concentration using linear hyperplane proposed in Gilmer et al. (2018)