Experiments used in "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning"
Switch branches/tags
Nothing to show
Clone or download
omegafragger and yaringal Grid search for hyper-parameter selection (#2)
* Restructured experiments on UCI datasets -

1. Changed folder structure to remove duplicate code.
2. Added cross-validation and grid-search code for getting optimal hyperparameters: dropout_rate and tau.
3. Modified net.py file to allow passing dropout_rate and tau as parameters to the network.`

* Removed .DS_Store files

* Removed .DS_Store files

* Further refinements to code -

1. Created single experiment.py file which can be parameterised with number of hidden layers and epoch multiplier
2. Updated readme file (more updates to come)
3. Deleted previous result directories and modified tau values.

* Modified readme to include updated command line parameters

* Minor bug fixes

* Bug fix

* Even more bug fixes

* Adding experiment results for bostonHousing dataset

* Adding experiment results for concrete dataset

* Made minor changes to experiment.py file and added a shell script to run all experiments.

* Minor temporary changes

* Added code for averaging results from ensembles during both training and cross validation

* Removed ensemble prediction during grid-search to save time.

* Removed dependency on Theano. Now we can use Tensorflow background

* Added code for performing multi-round validation before choosing best validation parameters.

* Removed verbose outputs for grid search

* Added experimental results

* Added the results obtained on all datasets without using ensembling

* Removed unnecessary files and commented the experiment files

* Removed YearPredictionMSD dataset

* Updated readme file

* Cleaned up code and renamed files to better names.

* Removed ensemble code and experiments

* Simplified code in experiment.py

* Removed commented unnecessary code in net.py.

* Update readme.md

* Update readme.md

* Update readme.md
Latest commit 6eb4497 Aug 9, 2018


This is the code used for the uncertainty experiments in the paper "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning" (2015), with a few adaptions following recent (2018) feedback from the community (many thanks to @capybaralet for spotting some bugs, and @omegafragger for restructuring the code). This code is based on the code by José Miguel Hernández-Lobato used for his paper "Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks". The datasets supplied here are taken from the UCI machine learning repository. Note the data splits used in these experiments (which are identical to the ones used in Hernández-Lobato's code). Because of the small size of the data, if you split the data yourself you will most likely get different and non-comparable results to the ones here.

Update (2018) We replaced the Bayesian optimisation implementation (which was used to find hypers) with a grid-search over the hypers. This is following feedback from @capybaralet who spotted test-set contamination (some train-set points, used to tune hypers which were shared across all splits, were used as test-set points in later splits). The new implementation iterates over the 20 splits, and for each train-test split it creates a new train-val split to tune hypers. These hypers are discarded between different train-test splits.

Below we report the new results using grid-search (new, with code in this updated repo) vs. results obtained from a re-run of the original code used in the paper which used Bayesian optimisation (paper, code in previous commit). Note that we report slightly different numbers for paper than in the previous commit, due to differences in package versions and hardware from 3 years ago. Further note the improved results in new on some datasets (mostly LL) due to proper grid-search (cases where BayesOpt failed). The other results agree with paper within standard error. If you used the code from the previous commits we advise you evaluate your method again following the stream-lined implementation here.

The experiments were run with Theano 0.8.2 and Keras 2.2.0. The baseline experiment (paper) was to simply run the previous "10x epochs one layer" code (can be found here) with the new versions of Theano and Keras. The new code (new) uses 10x training epochs and one layer as well, and trains models on the same 20 randomly generated train-test splits of the data. Each training set is further divided into an 80-20 train-validation split to find best hyperparameters, dropout rate and tau value through grid search. Finally, a network is trained on the whole training set using the best hyperparameters and is then tested on the test set. To run an experiment:

THEANO_FLAGS='allow_gc=False,device=gpu,floatX=float32' python experiment.py --dir <UCI Dataset directory> --epochx <Epoch multiplier> --hidden <number of hidden layers>

A summary of the results is reported below (lower RMSE is better, higher test log likelihood (LL) is better; note the ±X reported is standard error and not standard deviation).

Dataset BayesOpt RMSE (paper) Grid Search RMSE (new) BayesOpt LL (paper) Grid Search LL (new)
Boston Housing 2.83 ± 0.17 2.90 ± 0.18 -2.40 ± 0.04 -2.40 ± 0.04
Concrete Strength 4.93 ± 0.14 4.82 ± 0.16 -2.97 ± 0.02 -2.93 ± 0.02
Energy Efficiency 1.08 ± 0.03 0.54 ± 0.06 -1.72 ± 0.01 -1.21 ± 0.01
Kin8nm 0.09 ± 0.00 0.08 ± 0.00 0.97 ± 0.00 1.14 ± 0.01
Naval Propulsion 0.00 ± 0.00 0.00 ± 0.00 3.91 ± 0.01 4.45 ± 0.00
Power Plant 4.00 ± 0.04 4.01 ± 0.04 -2.79 ± 0.01 -2.80 ± 0.01
Protein Structure 4.27 ± 0.01 4.27 ± 0.02 -2.87 ± 0.00 -2.87 ± 0.00
Wine Quality Red 0.61 ± 0.01 0.62 ± 0.01 -0.92 ± 0.01 -0.93 ± 0.01
Yacht Hydrodynamics 0.70 ± 0.05 0.67 ± 0.05 -1.38 ± 0.01 -1.25 ± 0.01