Skip to content

truell20/grammarVAE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grammar Variational Autoencoder

This repository contains training and sampling code for the paper: Grammar Variational Autoencoder.

Requirements

Install (CPU version) using pip install -r requirements.txt

For GPU compatibility, replace the fourth line in requirements.txt with: https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl

Creating datasets

Molecules

To create the molecule datasets, call:

  • python make_zinc_dataset_grammar.py
  • python make_zinc_dataset_str.py

Equations

The equation dataset can be downloaded here: grammar, string

Training

Molecules

To train the molecule models, call:

  • python train_zinc.py % the grammar model
  • python train_zinc.py --latent_dim=2 --epochs=50 % train a model with a 2D latent space and 50 epochs
  • python train_zinc_str.py

Equations

  • python train_eq.py % the grammar model
  • python train_eq.py --latent_dim=2 --epochs=50 % train a model with a 2D latent space and 50 epochs
  • python train_eq_str.py

Sampling

Molecules

The file molecule_vae.py can be used to encode and decode SMILES strings. For a demo run:

  • python encode_decode_zinc.py

Equations

The analogous file equation_vae.py can encode and decode equation strings. Run:

  • python encode_decode_eq.py

Bayesian optimization

The Bayesian optimization experiments use sparse Gaussian processes coded in theano.

We use a modified version of theano with a few add ons, e.g. to compute the log determinant of a positive definite matrix in a numerically stable manner. The modified version of theano can be insalled by going to the folder Theano-master and typing

  • python setup.py install

The experiments with molecules require the rdkit library, which can be installed as described in http://www.rdkit.org/docs/Install.html.

The Bayesian optimization experiments can be replicated as follows:

1 - Generate the latent representations of molecules and equations. For this, go to the folders

molecule_optimization/latent_features_and_targets_grammar/

molecule_optimization/latent_features_and_targets_character/

equation_optimization/latent_features_and_targets_grammar/

equation_optimization/latent_features_and_targets_character/

and type

  • python generate_latent_features_and_targets.py

2 - Go to the folders

molecule_optimization/simulation1/grammar/

molecule_optimization/simulation1/character/

equation_optimization/simulation1/grammar/

equation_optimization/simulation1/character/

and type

  • nohup python run_bo.py &

Repeat this step for all the simulation folders (simulation2,...,simulation10). For speed, it is recommended to do this in a computer cluster in parallel.

2 - Extract the results by going to the folders

molecule_optimization/

equation_optimization/

and typing

  • python get_final_results.py
  • ./get_average_test_RMSE_LL.sh

GrammarVAE

About

Code for the "Grammar Variational Autoencoder" https://arxiv.org/abs/1703.01925

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.1%
  • Cuda 2.2%
  • C++ 2.0%
  • Jupyter Notebook 1.8%
  • C 1.5%
  • TeX 1.0%
  • Other 0.4%