Skip to content

Pareto-optimal data compression for binary classification tasks

License

Notifications You must be signed in to change notification settings

tailintalent/distillation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

distillation

Code for the paper "Pareto-optimal data compression for binary classification tasks" by Max Tegmark & Tailin Wu, 2019, https://arxiv.org/abs/1908.08961.

Data files

All our results can be computed starting with the parameters listed in Figure II, but to save time, you can instead start with the four files below, which each contain two columns (w,F1), where F_1 = P(W<w|Y=1):

  • AnalyticF.csv
  • CIFARF.csv
  • FashionMNISTF.csv
  • MNISTF.csv

After interpolating these curves, eq. (29) gives F_2(w) = P(W<w|Y=2) = 1/2 - F_1(1-w), after which the joint probability distribution P(Z,Y) can be rapidly computed for any binning with boundaries {b_0,b_1,...} using P_ij = F_j(b_i) - F_j(b_{i-1}).

For CIFAR, the conditional probability isn't quite monotonic, so as described in the paper, it's instead optimal to bin "vertically": load the file CIFARQ.csv {f,F1,F2}, interpolate to obtain the two functions F_1(f) and F_ss2(f), simply load and again compute P_ij = F_j(b_i) - F_j(b_{i-1}).

Code

The Matlab code probvided solves the constrained optimization problem of maximizing I(Z,W) subject to the constraint that H<H*, scanning H* over a rather dense grid of values. To compute the Pareto frontier I(H*) from data files such as those above, simply open main.min matlab and click "run".

This obtimization sometimes gets stuck in suboptimal local maxima. The grid points H* for which this happens can be automatically dropped by exploiting the fact that both H and I should be a monotonically increasing functions of H*. For example, here's Mathematica code function doing this:

dropBadLocalMinima2[results_] := Module[{res = {}, lastH = -666}, Do[ If[lastH != results[[i, 2]], lastH = results[[i, 2]]; PrependTo[res, results[[i]]]], {i, Length[results], 1, -1}]; res];

(* When you end up in a bad local minimum, I(X,Z) drops: *)

dropBadLocalMinima1[results_, initialI_Real] := Module[{res = {}, lastI = initialI}, Do[ If[lastI < results[[i, 3]], lastI = results[[i, 3]]; AppendTo[res, results[[i]]]], {i, Length[results]}]; dropBadLocalMinima2[res]];

If you have any questions about this, please email tegmark@mit.edu or tailin@mit.edu. :-)

Citation

If you compare with, build on, or use aspects of the distillation work, please cite the following:

@article{tegmark2019pareto,
  title={Pareto-optimal data compression for binary classification tasks},
  author={Tegmark, Max and Wu, Tailin},
  journal={arXiv preprint arXiv:1908.08961},
  year={2019}
}

About

Pareto-optimal data compression for binary classification tasks

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages