# Plotting the precision-recall (PR) curve

Please cite our *Nature Protocols* paper, which features this Jupyter notebook: 

Tran-Nguyen, V. K., Junaid, M., Simeon, S. & Ballester, P. J. A practical guide to machine-learning scoring for structure-based virtual screening. *Nat. Protoc.* **18**, 3460–3511 (2023)

This is a Jupyter notebook that helps users plot the precision-recall (PR) curve from a hit list issued by a virtual screening method or a scoring function.

### Step 1: Calling all necessary Python dependencies 

In [None]:
from sklearn import metrics
from sklearn.metrics import precision_recall_curve
import pandas as pd 
import matplotlib.pyplot as plt

### Step 2: Reading the csv hit list

If the csv hit list is issued by a generic scoring function (Smina, IFP, CNN-Score, RF-Score-VS):

In [None]:
df = pd.read_csv('Provide_the_pathway_to_your_csv_hit_list', sep = ',')
real_class = df['Real_Class']
score = df['Score']

If the csv hit list is issued by a target-specific machine learning scoring function (RF, XGB, SVM, ANN, DNN):

In [None]:
df = pd.read_csv('Provide_the_pathway_to_your_csv_hit_list', sep = ',')
real_class = df['Real_Class']
score = df['Active_Prob']

### Step 3: Plotting the PR curve 

In [None]:
precision, recall, thresholds = precision_recall_curve(real_class, score, pos_label = "Active")
fig, ax = plt.subplots(dpi=300, figsize=(7,5))
ax.plot(recall, precision, color='purple')
ax.set_ylabel('Precision')
ax.set_xlabel('Recall')
plt.show()