Skip to content

wawpaopao/Newmolecules

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DISAE

MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization

This is the repository to replicate experiments for the fine-tuning of classifier with pretrained ALBERT in the paper DISAE.

----------- INSTRUCTION -----------

1. Install Prerequisites

  • python 3.7
  • Pytorch
  • rdkit
  • Transformers (Huggingface. version 2.3.0)

2. Clone this repository

3. Download Data

All data could be download here and put it under this repository, i.e. in the same directory as the finetuning_train.py.

There will be four subdirectories in the data folder.

image

  • activity: gives you the train/dev/test set split based on protein similarity at threshold of bitscore 0.035
  • albertdata: gives you pretrained ALBERT model. The ALBERT is pretraind on distilled triplets of whole Pfam
  • Integrated: gives collected chemicals from several database
  • protein: gives you mapping from uniprot ID to triplets form

4. Run Finetuning

To run ALBERT model (default: ALBERRT frozen transformer):

python finetuning_train.py --protein_embedding_type="albert"

To try other freezing options, change "frozen_list" to choose modules to be frozen.

To run LSTM model:

python finetuning_train.py --protein_embedding_type="lstm"

distilled-and-architecture

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages