GuacaMol Baselines

A series of baseline model implementations for the guacamol benchmark for generative chemistry.
A more in depth explanation of the benchmarks and scores for these baselines is can be found in our paper.

Dependencies

To install all dependencies:

conda install rdkit -c rdkit
pip install -r requirements.txt

Dataset

Some baselines require the guacamol dataset to run, to get it run:

bash fetch_guacamol_dataset.sh

Random Sampler

Dummy baseline, always returning random molecules form the guacamol training set.

To execute the goal-directed generation benchmarks:

python -m random_smiles_sampler.goal_directed_generation

To execute the distribution learning benchmarks:

python -m random_smiles_sampler.distribution_learning

Best from ChEMBL

Dummy baseline that simply returns the molecules from the guacamol training set that best satisfy the score of a goal-directed benchmark.
There is no model nor training, its only purpose is to establish a lower bound on the benchmark scores.

To execute the goal-directed generation benchmarks:

python -m best_from_chembl.goal_directed_generation

No distribution learning benchmark available.

SMILES GA

Genetic algorithm on SMILES as described in: https://www.journal.csj.jp/doi/10.1246/cl.180665

Implementation adapted from: https://github.com/tsudalab/ChemGE

To execute the goal-directed generation benchmarks:

python -m smiles_ga.goal_directed_generation

No distribution learning benchmark available.

Graph GA

Genetic algoritm on molecule graphs as described in: https://doi.org/10.26434/chemrxiv.7240751

Implementation adapted from: https://github.com/jensengroup/GB-GA

To execute the goal-directed generation benchmarks:

python -m graph_ga.goal_directed_generation

No distribution learning benchmark available.

Graph MCTS

Monte Carlo Tree Search on molecule graphs as described in: https://doi.org/10.26434/chemrxiv.7240751

Implementation adapted from: https://github.com/jensengroup/GB-GB

To execute the goal-directed generation benchmarks:

python -m graph_mcts.goal_directed_generation

To execute the distribution learning benchmarks:

python -m graph_mcts.distribution_learning

To re-generate the distribution statistics as pickle files:

python -m graph_mcts.analyze_dataset

SMILES LSTM Hill Climbing

Long-short term memory on SMILES as described in: https://arxiv.org/abs/1701.01329

This implementation optimizes using hill climbing algorithm.

Implementation by BenevolentAI

A pre-trained model is provided in: smiles_lstm/pretrained_model

To execute the goal-directed generation benchmarks:

python -m smiles_lstm_hc.goal_directed_generation

To execute the distribution learning benchmark:

python -m smiles_lstm_hc.distribution_learning

To train a model from scratch:

python -m smiles_lstm_hc.train_smiles_lstm_model

SMILES LSTM PPO

Long-short term memory on SMILES as described in: https://arxiv.org/abs/1701.01329

This implementation optimizes using proximal policy optimization algorithm.

Implementation by BenevolentAI

A pre-trained model is provided in: smiles_lstm/pretrained_model

To execute the goal-directed generation benchmarks:

python -m smiles_lstm_ppo.goal_directed_generation

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
best_from_chembl		best_from_chembl
dockers		dockers
graph_ga		graph_ga
graph_mcts		graph_mcts
moses_baselines		moses_baselines
random_smiles_sampler		random_smiles_sampler
smiles_ga		smiles_ga
smiles_lstm_hc		smiles_lstm_hc
smiles_lstm_ppo		smiles_lstm_ppo
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fetch_guacamol_dataset.sh		fetch_guacamol_dataset.sh
requirements.txt		requirements.txt

License

zanghyu/guacamol_baselines

Folders and files

Latest commit

History

Repository files navigation

GuacaMol Baselines

Dependencies

Dataset

Random Sampler

Best from ChEMBL

SMILES GA

Graph GA

Graph MCTS

SMILES LSTM Hill Climbing

SMILES LSTM PPO

About

Resources

License

Stars

Watchers

Forks

Languages