Under the folder toy_example, we provide a jupyter notebook
jan22_toy_example.ipynb that works through the training and evaluation of our autoencoder (as well as other baseline algorithms) for the synthetic1 dataset. We highly recommend interested readers to take a look before diving deep into our code.
The source code contains four parts:
- Code for each dataset
- Scripts for reproducing our results
- Code and scripts for one of the baselines
Simple AE + l1-min
- scripts are under simpleAE_scripts/
To reproduce our experimental results, first run
chmod +x scripts/*.sh to make the scripts executable. After that, run the given scripts:
- The results are stored in a python dictionary which is then saved under the folder
ckpts/. They can be used to reproduce the figures shown in our paper.
- Before running
train.csvfrom this kaggle competition and specify its location via --data_dir.
- The RCV1 dataset will be fetched automatically using the
- To reproduce results of one of the baselines
Simple AE + l1-min, run scripts under the folder simpleAE_scripts/.
- For high-dimensional vectors, solving
l1-minusing Gurobi takes a long time on a single CPU. To speed up, we solve
l1_minin parallel on a multi-core machine. In
rcv1_main.py, performance evaluation is performed on a small set of the test samples (while training is still done using the complete training set). After training the autoencoder, we use a multi-core machine and solve
l1_minin parallel on the complete test set using
rcv1_parallel_l1.py. Depending on your multi-core machine, solving
l1_minin parallel on the complete test set may still take a long time, I would recommend running
rcv1_parallel_l1.pyfirst with a small subset (by setting a small number for the parameters
batchin the python file).
Here is our software environment.
- Python 2.7.12
- numpy 1.13.3
- sklearn 0.19.1
- scipy 1.0.0
- joblib 0.10.0
- Tensorflow r1.4
- Gurobi 7.5.1