Skip to content

zengtsysu/TeroGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TeroGen

Implementation of the terpenoids generation used in "Bio-inspired Chemical Space Exploration of Terpenoids"

Introduction


Source code used to create, train and sample a bio-inspired terpenoids generation model described in Bio-inspired Chemical Space Exploration of Terpenoids.

The model is divided into two relatively independent parts, a Reactor, which conducts metadynamics simulations to explore the reaction space of given carbocations and a Decorator, which predicts the decorating sites and groups for a given skeleton.

Set up


For Reactor xtb is required. In some scripts of Decorator Spark 2.4 is required (and thus Java 8). Reactor was tested on Linux with CPUs and Decorator was tested with a Tesla V-100. You will wish to install TeroGen in a virtual environment to prevent conflicting dependencies.

conda create -n terogen python==3.6.13
conda activate terogen
sh install.sh

General Usage


Carbocation Reactor (./reactor)

Any arbitrary molecule can be used as the initial structure for metadynamics sumilations, while herein for tergenoids generation, the isoprenoid carbocations were used.

reactor.sh: This script will conduct medadynamics sumilations with specific initial structure and parameters. The output is a reactant-product list with energetics properties in tsv format. deprotonation.py: This will quench carbocations by exhaustive deprotonation and also output a reactant-product list in tsv format.

It will take about 10 hours on 12 CPUs (Xeon E5-2609 1.70GHz) to run the demo provided in the script.

Skeleton Decoration (./decorator)

There are two step for skeleton decoration, sites prediction and groups prediction. First, the decorating sites were predicted with Transformer model trained using OpenNMT and PyTorch and then, the R-groups were predicted with the RNN-based model proposed by Arús-Pous et al..

Sites prediction:

One need to first download the data and unzip under terogen/Decorator/site_prediction, in which the checkppints can be used to do the prediction directly and the dataset can be used to train and test the model by user.

skeleton_extraction.py: This is used to extract the carbon skeleton from the terpenoids structure. site_prediction.sh: This script is used to train and test the sites prediction model.

Groups prediction:
This model was analogous to the scaffold decorator proposed in "SMILES-based deep generative scaffold decorator for de-novo drug design"

One need to first download the data and unzip under terogen/Decorator/group_decoration, in which the checkppints can be used to do the prediction directly and the dataset can be used to train and test the model by user.

group_prediction.sh: This script is used to train and test the sites prediction model.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published