Skip to content
/ CARA Public

Benchmarking compound activity prediction for real-world drug discovery applications

License

Notifications You must be signed in to change notification settings

tiantz17/CARA

Repository files navigation

CARA

DOI DOI

Benchmarking Compound Activity Prediction for Real-World Drug Discovery Applications

1. Introduction

CARA is a benchmark of compound activity prediction and evaluation for real-world applications.

1.1 Features of CARA

  • Real-world data: CARA contains large-scale, high-quality, and real-world compound activity data measured by wet-lab experiments, which were collected from the ChEMBL database.
  • Assay-level organization: CARA organizes the compound activities into assays (slightly different from ChEMBL assays), i.e., one assay, one target, one measurement type, many compounds.
  • Representative targets: CARA selects representative protein targets for test, reducing the influence of long-tailed distribution of protein exposure.
  • Distinguished tasks: CARA considers the compound activities from different stages, i.e., virtual screening (VS) or lead optimization (LO), of drug discovery seperately.
  • Diverse scenarios: CARA provides the learning scenarios of both zero-shot (ZS) and few-shot (FS).
  • Regression objective: CARA adopts a regression task without defining a threshold for positive and nagative samples.
  • Assay-level evaluation: CARA evaluates the compound activity prediction models at the assay level, preventing the bulk evaluation bias.
  • Specific metrics: CARA evaluates VS and LO tasks in different metrics according to their distinct goals in practice.
  • Success rates: CARA defines success rates based on assay-level evaluations to provide direct understanding of performances.
  • Informative leaderboard: CARA provides the performance comparison of selected state-of-the-art methods for compound activity prediction.

1.2 Tasks

CARA defines six tasks with two task types and three target types. The two task types are virtual screening (VS) and lead optimization (LO). The VS task focuses on screening hit compounds for specific target from a compound library with diverse scaffolds. The LO task tries to optimize a compound to those that have better activities.

The three target types are All, Kinase, and G-protein coupled receptor (GPCR).

As a result, the six tasks are VS-All, VS-Kinase, VS-GPCR, LO-All, LO-Kinase, and LO-GPCR.

We suggest use VS-All and LO-All for performance evaluation and comparison (see our manuscript for more details).

1.3 Train-test splitting schemes

The train-test splitting is conducted at the assay level, i.e., training assay and test assay. We also make sure that there is no data leakage.

For the VS task, we use new-protein splitting scheme such that the protein targets in the test assays were not seen during training.

For the LO task, we use new-assay splitting scheme such that the congeneric compounds in the test assays were not seen during training.

1.4 Few-shot scenario

For FS scenario, the samples in the test assays are further splitted into support samples and query samples. Therefore, you can use the support samples for training or fine-tuning. In this case, the query samples are used for evaluation.

1.5 Evaluation metrics

For the VS task, we care more about the accuracy of top ranking compounds, therefore, we mainly use enrichment factors.

  • EF@1%: Enrichment factor at top 1%. The hit compounds are defined as those with top 1% highest activities.

  • EF@5%: Enrichment factor at top 5%. The hit compounds are defined as those with top 5% highest activities.

  • SR@1%: Success rate at top 1%. The hit compounds are defined as those with top 1% highest activities. Success: at least one hit compound ranked at the top 1% of the list by predicted scores.

  • SR@5%: Success rate at top 5%. The hit compounds are defined as those with top 5% highest activities. Success: at least one hit compound ranked at the top 5% of the list by predicted scores.

For the LO task, we need the overall rankings of the compounds, therefore we use correlation coefficients.

  • PCC: Pearson's correlation coefficient.

  • SCC: Spearman's correlation coefficient.

  • SR@0.5: Success rate with PCC > 0.5. Success: PCC > 0.5.

1.6 Statistics

Task VS-All VS-Kinase VS-GPCR LO-All LO-Kinase LO-GPCR
#Assays 12,029 2,733 2,256 81,187 11,276 22,917
#Proteins 2,242 434 268 4,456 487 579
#Compounds 317,855 25,943 41,352 625,099 111,279 161,263
#Samples 1,237,256 84,605 70,179 1,187,136 200,800 321,904
#Training assays 9,408 1,459 1,584 81,033 11,220 22,872
#Test assays 100 58 18 100 54 43

2. Code for Training and Evaluation

We provide the code for model training and performance evaluation based on CARA. You can install the following packages to run the code, the installation time may cost a few minutes to several hours. The training may cost several hours to a few days for different tasks, dataset sizes, or methods.

2.1 Requirements

python=3.7.11
pandas=1.3.5
numpy=1.21.5
scipy=1.7.3
json=2.0.9
scikit-learn=1.0.2
pytorch=1.12.1
torch-geometric=2.2.0
rdkit=2020.09.1.0
gensim=3.8.3
networkx=2.6.3
subword_nmt=0.3.8
codecs

2.2 General train, pre-train

python -u runTrain.py --model [MODEL] --dataset_params task:[TASK],subset:[SUBSET] --info [INFO] --gpu [GPU]
  • MODEL: model name, e.g., DeepConvDTI
  • TASK: task name, e.g., VS_All
  • SUBSET: subset of data, e.g., train, support
  • INFO: mark for training, e.g., fastTrain, begin with 'fast' will print less to speed up
  • GPU: gpu number, e.g., 0

2.3 Meta-train

python -u runTrain.py --model [MODEL]Meta --dataset_params task:[TASK],subset:[SUBSET],n_way:[N_WAY],k_shot:[K_SHOT] --info [INFO] --gpu [GPU]
  • N_WAY: number of assays per batch, e.g., 1
  • K_SHOT: number of support samples per assay, e.g., 50

2.4 General test

python -u runTest.py --model [MODEL] --model_path [MODEL_PATH] --dataset_params task:[TASK],subset:[SUBSET] --info [INFO] --gpu [GPU]
  • MODEL: model name, e.g., DeepConvDTI
  • MODEL_PATH: folder to the trained models
  • TASK: task name, e.g., VS_All
  • SUBSET: subset of data, e.g., test, query
  • INFO: mark for training, e.g., test,
  • GPU: gpu number, e.g., 0

2.5 Fine-tune, meta-test

python -u runTest.py --model [MODEL]Meta ---model_path [MODEL_PATH] --dataset_params task:[TASK],subset:finetune,step:[STEP],shot:[SHOT] --info [INFO] --gpu [GPU]
  • STEP: number of steps for fine-tuning, e.g., 10000
  • SHOT: number of support samples per assay for fine-tuning, e.g., 50

3. Leaderboard

3.1 Leaderboard of the VS-All task under the ZS scenario

Method EF@1% SR@1% (%) EF@5% SR@5% (%)
DeepConvDTI 9.48 ± 1.22 39.40 ± 2.73 3.22 ± 0.24 81.60 ± 2.87
DeepDTA 8.76 ± 1.56 36.00 ± 3.52 3.37 ± 0.43 83.40 ± 2.87
DeepCPI 7.73 ± 0.34 31.80 ± 1.94 2.95 ± 0.22 78.60 ± 2.65
MONN 7.08 ± 0.64 33.00 ± 2.68 2.70 ± 0.47 76.00 ± 4.15
Tsubaki 6.09 ± 1.30 30.60 ± 2.80 2.53 ± 0.14 79.20 ± 2.86
TransformerCPI 5.60 ± 0.65 28.20 ± 3.06 2.46 ± 0.29 78.00 ± 2.53
MolTrans 5.61 ± 0.90 29.60 ± 2.80 2.20 ± 0.13 74.00 ± 1.79
GraphDTA 4.70 ± 0.88 24.40 ± 1.96 1.88 ± 0.21 70.80 ± 4.07

3.2 Leaderboard of the LO-All task under the ZS scenario

Method SCC PCC SR@0.5 (%)
DeepConvDTI 0.30 ± 0.01 0.31 ± 0.01 26.60 ± 2.15
DeepDTA 0.28 ± 0.01 0.30 ± 0.01 22.40 ± 1.36
DeepCPI 0.24 ± 0.01 0.25 ± 0.01 16.00 ± 0.63
MONN 0.25 ± 0.01 0.27 ± 0.01 15.40 ± 2.24
Tsubaki 0.19 ± 0.02 0.19 ± 0.01 9.40 ± 1.62
TransformerCPI 0.19 ± 0.01 0.19 ± 0.02 8.00 ± 2.90
MolTrans 0.20 ± 0.01 0.20 ± 0.02 12.20 ± 1.47
GraphDTA 0.22 ± 0.01 0.24 ± 0.01 15.20 ± 2.04

3.3 Leaderboard of the VS-All task under the FS scenario

Strategy Method EF@1% SR@1% (%)
Pre-training DeepCPI 7.96 ± 0.82 29.80 ± 1.47
Pre-training DeepDTA 9.17 ± 1.53 34.20 ± 4.26
Pre-training DeepConvDTI 9.14 ± 1.28 35.40 ± 2.87
QSAR RF 7.83 ± 0.49 28.40 ± 0.80
QSAR GBT 7.90 ± 0.52 29.60 ± 1.02
QSAR SVM 6.29 ± 0.00 28.00 ± 0.00
QSAR DNN 8.82 ± 0.39 27.60 ± 0.80
Pre-training and fine-tuning DeepCPI 9.92 ± 0.86 32.80 ± 1.17
Pre-training and fine-tuning DeepDTA 13.02 ± 3.26 41.20 ± 5.19
Pre-training and fine-tuning DeepConvDTI 11.17 ± 1.94 36.80 ± 2.64
Meta-learning DeepCPI-c 11.85 ± 2.20 36.60 ± 2.73
Meta-learning MTDNN 12.76 ± 0.37 41.60 ± 1.20
Meta-learning DeepConvDTI-c 15.67 ± 1.66 44.60 ± 4.13
Multi-task learning MTDNN 4.20 ± 1.58 25.20 ± 4.75
Multi-task learning DeepConvDTI-c 15.95 ± 1.77 45.20 ± 1.72
Re-training DeepCPI 13.29 ± 1.34 41.20 ± 2.93
Re-training DeepDTA 11.81 ± 2.08 39.00 ± 3.41
Re-training DeepConvDTI 13.28 ± 0.59 42.80 ± 1.72

3.4 Leaderboard of the LO-All task under the FS scenario

Strategy Method PCC SR@0.5 (%)
Pre-training DeepCPI 0.26 ± 0.01 16.60 ± 1.02
Pre-training DeepDTA 0.30 ± 0.01 24.00 ± 1.67
Pre-training DeepConvDTI 0.32 ± 0.01 28.60 ± 1.85
QSAR RF 0.55 ± 0.00 58.20 ± 1.72
QSAR GBT 0.54 ± 0.00 57.80 ± 1.17
QSAR SVM 0.57 ± 0.00 65.00 ± 0.00
QSAR DNN 0.54 ± 0.00 61.20 ± 1.72
Pre-training and fine-tuning DeepCPI 0.33 ± 0.01 18.80 ± 0.40
Pre-training and fine-tuning DeepDTA 0.49 ± 0.01 51.00 ± 1.79
Pre-training and fine-tuning DeepConvDTI 0.39 ± 0.01 32.40 ± 1.36
Meta-learning DeepCPI-c 0.42 ± 0.02 37.40 ± 3.44
Meta-learning MTDNN 0.46 ± 0.04 42.26 ± 6.92
Meta-learning DeepConvDTI-c 0.55 ± 0.01 62.80 ± 4.07
Multi-task learning MTDNN 0.48 ± 0.00 48.20 ± 1.60
Multi-task learning DeepConvDTI-c 0.58 ± 0.00 65.40 ± 2.06
Re-training DeepCPI 0.44 ± 0.00 43.89 ± 2.91
Re-training DeepDTA 0.46 ± 0.01 45.80 ± 1.72
Re-training DeepConvDTI 0.50 ± 0.01 54.40 ± 1.96

About

Benchmarking compound activity prediction for real-world drug discovery applications

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages