# Lab 1: Ligand-based Virtual Screening 

In this first lab, we will delve into the realm of virtual screening. Using datasets of 2D molecules, we will develop predictive models to assess inhibitory activity against a human kinase EGFR(Epidermal Growth Factor Receptor) protein. Building on concepts from lectures on molecular representation, scoring, and Graph Neural Networks (GNNs) for Chemistry, we will utilize `PyTorch`,`PyG`, `Scikit-learn`, and other libraries to create both GNN models and classical Random Forest models with molecular fingerprints. The ultimate objective is to screen a small commercial library and select 100 promising molecules for further experimental investigation.

<img src="figures/fphar-09-01275-g001.jpg" width="500"/>

<p style="color:sliver">Figure from https://doi.org/10.3389/fphar.2018.01275</p>



## Content
- [Introduction](#intro)
- [Data acquisition and preparation](#data)
- [Machine learning - QSAR modeling](#ml)
- [Virtual screening on commercial library](#vs)
- [Compound prioritization](#prio)
- [Discussion](#disc)


To run this tutorial, please ensure all dependencies below are installed. 

- `datamol`
- `molfeat`
- `splito`
- `scikit-learn`
- `pytorch`
- `pyG`
<!-- - Optional: `chembl_webresource_client` -->

You can install those dependencies by 
```shell
conda env create -f env.yml
```

## Target of interest: EGFR (Epidermal Growth Factor Receptor)

The protein encoded by this gene is a transmembrane glycoprotein that is a member of the protein **kinase** superfamily. This protein is a receptor for members of the epidermal growth factor family. EGFR is a cell surface protein that binds to epidermal growth factor, thus inducing receptor dimerization and tyrosine autophosphorylation leading to cell proliferation. 

EGFR is a frequently over-expressed and aberrantly activated trans-membrane protein in non-small cell lung cancer (NSCLC) patients, described for the first time in 2004. Mutations in this gene are associated with lung cancer in particular.

### Types of EGFR inhibitions:
- competitive 
- covalent
- allosteric

**Example of first generation EGFR inhibitor**\
<img src="figures/EGFR_ATP.png" width="500"/>


### Targeted compound library

In this tutorial, we will focus on identifying potential inhibitors that target the ATP-binding pocket of kinases. We will perform virtual screening against 24 000 compounds compounds from the [Hinge Binders Library](https://enamine.net/compound-libraries/targeted-libraries/kinase-library/hinge-binders-library), a commercial library specifically designed for discovering novel kinase ATP pocket binders. By utilizing this targeted library, we aim to efficiently identify promising candidates and prioritize our experimental resources on the most promising leads.

<img src="figures/KINASE_HINGE_RDL_1.png" width="500"/>


# Questions?