Benchmarking Compound Activity Prediction for Real-World Drug Discovery Applications
CARA is a benchmark of compound activity prediction and evaluation for real-world applications.
- Real-world data: CARA contains large-scale, high-quality, and real-world compound activity data measured by wet-lab experiments, which were collected from the ChEMBL database.
- Assay-level organization: CARA organizes the compound activities into assays (slightly different from ChEMBL assays), i.e., one assay, one target, one measurement type, many compounds.
- Representative targets: CARA selects representative protein targets for test, reducing the influence of long-tailed distribution of protein exposure.
- Distinguished tasks: CARA considers the compound activities from different stages, i.e., virtual screening (VS) or lead optimization (LO), of drug discovery seperately.
- Diverse scenarios: CARA provides the learning scenarios of both zero-shot (ZS) and few-shot (FS).
- Regression objective: CARA adopts a regression task without defining a threshold for positive and nagative samples.
- Assay-level evaluation: CARA evaluates the compound activity prediction models at the assay level, preventing the bulk evaluation bias.
- Specific metrics: CARA evaluates VS and LO tasks in different metrics according to their distinct goals in practice.
- Success rates: CARA defines success rates based on assay-level evaluations to provide direct understanding of performances.
- Informative leaderboard: CARA provides the performance comparison of selected state-of-the-art methods for compound activity prediction.
CARA defines six tasks with two task types and three target types. The two task types are virtual screening (VS) and lead optimization (LO). The VS task focuses on screening hit compounds for specific target from a compound library with diverse scaffolds. The LO task tries to optimize a compound to those that have better activities.
The three target types are All, Kinase, and G-protein coupled receptor (GPCR).
As a result, the six tasks are VS-All, VS-Kinase, VS-GPCR, LO-All, LO-Kinase, and LO-GPCR.
We suggest use VS-All and LO-All for performance evaluation and comparison (see our manuscript for more details).
The train-test splitting is conducted at the assay level, i.e., training assay and test assay. We also make sure that there is no data leakage.
For the VS task, we use new-protein splitting scheme such that the protein targets in the test assays were not seen during training.
For the LO task, we use new-assay splitting scheme such that the congeneric compounds in the test assays were not seen during training.
For FS scenario, the samples in the test assays are further splitted into support samples and query samples. Therefore, you can use the support samples for training or fine-tuning. In this case, the query samples are used for evaluation.
For the VS task, we care more about the accuracy of top ranking compounds, therefore, we mainly use enrichment factors.
-
EF@1%: Enrichment factor at top 1%. The hit compounds are defined as those with top 1% highest activities.
-
EF@5%: Enrichment factor at top 5%. The hit compounds are defined as those with top 5% highest activities.
-
SR@1%: Success rate at top 1%. The hit compounds are defined as those with top 1% highest activities. Success: at least one hit compound ranked at the top 1% of the list by predicted scores.
-
SR@5%: Success rate at top 5%. The hit compounds are defined as those with top 5% highest activities. Success: at least one hit compound ranked at the top 5% of the list by predicted scores.
For the LO task, we need the overall rankings of the compounds, therefore we use correlation coefficients.
-
PCC: Pearson's correlation coefficient.
-
SCC: Spearman's correlation coefficient.
-
SR@0.5: Success rate with PCC > 0.5. Success: PCC > 0.5.
Task | VS-All | VS-Kinase | VS-GPCR | LO-All | LO-Kinase | LO-GPCR |
---|---|---|---|---|---|---|
#Assays | 12,029 | 2,733 | 2,256 | 81,187 | 11,276 | 22,917 |
#Proteins | 2,242 | 434 | 268 | 4,456 | 487 | 579 |
#Compounds | 317,855 | 25,943 | 41,352 | 625,099 | 111,279 | 161,263 |
#Samples | 1,237,256 | 84,605 | 70,179 | 1,187,136 | 200,800 | 321,904 |
#Training assays | 9,408 | 1,459 | 1,584 | 81,033 | 11,220 | 22,872 |
#Test assays | 100 | 58 | 18 | 100 | 54 | 43 |
We provide the code for model training and performance evaluation based on CARA. You can install the following packages to run the code, the installation time may cost a few minutes to several hours. The training may cost several hours to a few days for different tasks, dataset sizes, or methods.
python=3.7.11
pandas=1.3.5
numpy=1.21.5
scipy=1.7.3
json=2.0.9
scikit-learn=1.0.2
pytorch=1.12.1
torch-geometric=2.2.0
rdkit=2020.09.1.0
gensim=3.8.3
networkx=2.6.3
subword_nmt=0.3.8
codecs
python -u runTrain.py --model [MODEL] --dataset_params task:[TASK],subset:[SUBSET] --info [INFO] --gpu [GPU]
MODEL
: model name, e.g., DeepConvDTITASK
: task name, e.g., VS_AllSUBSET
: subset of data, e.g., train, supportINFO
: mark for training, e.g., fastTrain, begin with 'fast' will print less to speed upGPU
: gpu number, e.g., 0
python -u runTrain.py --model [MODEL]Meta --dataset_params task:[TASK],subset:[SUBSET],n_way:[N_WAY],k_shot:[K_SHOT] --info [INFO] --gpu [GPU]
N_WAY
: number of assays per batch, e.g., 1K_SHOT
: number of support samples per assay, e.g., 50
python -u runTest.py --model [MODEL] --model_path [MODEL_PATH] --dataset_params task:[TASK],subset:[SUBSET] --info [INFO] --gpu [GPU]
MODEL
: model name, e.g., DeepConvDTIMODEL_PATH
: folder to the trained modelsTASK
: task name, e.g., VS_AllSUBSET
: subset of data, e.g., test, queryINFO
: mark for training, e.g., test,GPU
: gpu number, e.g., 0
python -u runTest.py --model [MODEL]Meta ---model_path [MODEL_PATH] --dataset_params task:[TASK],subset:finetune,step:[STEP],shot:[SHOT] --info [INFO] --gpu [GPU]
STEP
: number of steps for fine-tuning, e.g., 10000SHOT
: number of support samples per assay for fine-tuning, e.g., 50
Method | EF@1% | SR@1% (%) | EF@5% | SR@5% (%) |
---|---|---|---|---|
DeepConvDTI | 9.48 ± 1.22 | 39.40 ± 2.73 | 3.22 ± 0.24 | 81.60 ± 2.87 |
DeepDTA | 8.76 ± 1.56 | 36.00 ± 3.52 | 3.37 ± 0.43 | 83.40 ± 2.87 |
DeepCPI | 7.73 ± 0.34 | 31.80 ± 1.94 | 2.95 ± 0.22 | 78.60 ± 2.65 |
MONN | 7.08 ± 0.64 | 33.00 ± 2.68 | 2.70 ± 0.47 | 76.00 ± 4.15 |
Tsubaki | 6.09 ± 1.30 | 30.60 ± 2.80 | 2.53 ± 0.14 | 79.20 ± 2.86 |
TransformerCPI | 5.60 ± 0.65 | 28.20 ± 3.06 | 2.46 ± 0.29 | 78.00 ± 2.53 |
MolTrans | 5.61 ± 0.90 | 29.60 ± 2.80 | 2.20 ± 0.13 | 74.00 ± 1.79 |
GraphDTA | 4.70 ± 0.88 | 24.40 ± 1.96 | 1.88 ± 0.21 | 70.80 ± 4.07 |
Method | SCC | PCC | SR@0.5 (%) |
---|---|---|---|
DeepConvDTI | 0.30 ± 0.01 | 0.31 ± 0.01 | 26.60 ± 2.15 |
DeepDTA | 0.28 ± 0.01 | 0.30 ± 0.01 | 22.40 ± 1.36 |
DeepCPI | 0.24 ± 0.01 | 0.25 ± 0.01 | 16.00 ± 0.63 |
MONN | 0.25 ± 0.01 | 0.27 ± 0.01 | 15.40 ± 2.24 |
Tsubaki | 0.19 ± 0.02 | 0.19 ± 0.01 | 9.40 ± 1.62 |
TransformerCPI | 0.19 ± 0.01 | 0.19 ± 0.02 | 8.00 ± 2.90 |
MolTrans | 0.20 ± 0.01 | 0.20 ± 0.02 | 12.20 ± 1.47 |
GraphDTA | 0.22 ± 0.01 | 0.24 ± 0.01 | 15.20 ± 2.04 |
Strategy | Method | EF@1% | SR@1% (%) |
---|---|---|---|
Pre-training | DeepCPI | 7.96 ± 0.82 | 29.80 ± 1.47 |
Pre-training | DeepDTA | 9.17 ± 1.53 | 34.20 ± 4.26 |
Pre-training | DeepConvDTI | 9.14 ± 1.28 | 35.40 ± 2.87 |
QSAR | RF | 7.83 ± 0.49 | 28.40 ± 0.80 |
QSAR | GBT | 7.90 ± 0.52 | 29.60 ± 1.02 |
QSAR | SVM | 6.29 ± 0.00 | 28.00 ± 0.00 |
QSAR | DNN | 8.82 ± 0.39 | 27.60 ± 0.80 |
Pre-training and fine-tuning | DeepCPI | 9.92 ± 0.86 | 32.80 ± 1.17 |
Pre-training and fine-tuning | DeepDTA | 13.02 ± 3.26 | 41.20 ± 5.19 |
Pre-training and fine-tuning | DeepConvDTI | 11.17 ± 1.94 | 36.80 ± 2.64 |
Meta-learning | DeepCPI-c | 11.85 ± 2.20 | 36.60 ± 2.73 |
Meta-learning | MTDNN | 12.76 ± 0.37 | 41.60 ± 1.20 |
Meta-learning | DeepConvDTI-c | 15.67 ± 1.66 | 44.60 ± 4.13 |
Multi-task learning | MTDNN | 4.20 ± 1.58 | 25.20 ± 4.75 |
Multi-task learning | DeepConvDTI-c | 15.95 ± 1.77 | 45.20 ± 1.72 |
Re-training | DeepCPI | 13.29 ± 1.34 | 41.20 ± 2.93 |
Re-training | DeepDTA | 11.81 ± 2.08 | 39.00 ± 3.41 |
Re-training | DeepConvDTI | 13.28 ± 0.59 | 42.80 ± 1.72 |
Strategy | Method | PCC | SR@0.5 (%) |
---|---|---|---|
Pre-training | DeepCPI | 0.26 ± 0.01 | 16.60 ± 1.02 |
Pre-training | DeepDTA | 0.30 ± 0.01 | 24.00 ± 1.67 |
Pre-training | DeepConvDTI | 0.32 ± 0.01 | 28.60 ± 1.85 |
QSAR | RF | 0.55 ± 0.00 | 58.20 ± 1.72 |
QSAR | GBT | 0.54 ± 0.00 | 57.80 ± 1.17 |
QSAR | SVM | 0.57 ± 0.00 | 65.00 ± 0.00 |
QSAR | DNN | 0.54 ± 0.00 | 61.20 ± 1.72 |
Pre-training and fine-tuning | DeepCPI | 0.33 ± 0.01 | 18.80 ± 0.40 |
Pre-training and fine-tuning | DeepDTA | 0.49 ± 0.01 | 51.00 ± 1.79 |
Pre-training and fine-tuning | DeepConvDTI | 0.39 ± 0.01 | 32.40 ± 1.36 |
Meta-learning | DeepCPI-c | 0.42 ± 0.02 | 37.40 ± 3.44 |
Meta-learning | MTDNN | 0.46 ± 0.04 | 42.26 ± 6.92 |
Meta-learning | DeepConvDTI-c | 0.55 ± 0.01 | 62.80 ± 4.07 |
Multi-task learning | MTDNN | 0.48 ± 0.00 | 48.20 ± 1.60 |
Multi-task learning | DeepConvDTI-c | 0.58 ± 0.00 | 65.40 ± 2.06 |
Re-training | DeepCPI | 0.44 ± 0.00 | 43.89 ± 2.91 |
Re-training | DeepDTA | 0.46 ± 0.01 | 45.80 ± 1.72 |
Re-training | DeepConvDTI | 0.50 ± 0.01 | 54.40 ± 1.96 |