Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Variable Skipping for Autoregressive Range Density Estimation

This repo contains the code for reproducing the results for the variable skipping paper.

Downloading Datasets

IMPORTANT: This repo only includes the first 100 rows of each dataset. This is sufficient to sanity check if the code runs, but to run real experiments you'll need to download the original files and replace the samples in datasets/.

For Dryad-URLs, see:

For Census, see:

For KDD, see:

For DMV-Full, see:

Code Structure

  • datasets/: folder of actual data.
  • defines the dataset schemas and data loading code.
  • defines the progressive sampling algorithm used for inference.
  • defines the ResMADE model.
  • defines the masked transformer model.
  • defines the code for pattern matching over text.
  • defines random query generation and evaluation.
  • main script used to launch experiments and grid sweeps in a Ray cluster.

Running Experiments

To set up a conda environment, run:

conda env create -f environment.yml
source activate varskip

To run training and evaluation with the natural column order, you can use ./ dmv-full, ./ kdd, and ./ census.

To run the full grid sweeps from the paper, use ./ --run dmv-full-final kdd-final census-final. For multi-order training, append -mo (e.g., ./ --run kdd-final-mo).

Results are printed to stdout and also stored in ~/ray_results. To analyze the quantiles of the results, you can use the script.