# Filter fragments for drug likeness

## Aim of this notebook

This notebook is filtering the fragments for drug likeness. 
* The first filter step checks whether the fragments fulfill the Rule of Three (Ro3) ([Drug Discovery Today, 2003, 8(19):876-877](https://www.sciencedirect.com/science/article/abs/pii/S1359644603028319?via%3Dihub)). 
* The second filter calculates the Quantitative Estimate of Druglikeness (QED) ([Nat Chem. 2012 Jan 24; 4(2): 90–98](https://www.nature.com/articles/nchem.1243)), reflecting the molecular properties of the fragments.

## Table of contents
1. Load fragment library
2. Apply pre-filters
3. Filter for Rule of Three (Ro3)
4. Filter for Quantitative Estimate of Druglikeness (QED)
5. Analyze accepted/rejected fragments

    5.1. Count number of fragments that are accepted by the filter(s)
    
    5.2. Histogram of QED values

## Imports and preprocessing

In [1]:
from pathlib import Path

import pandas as pd
from rdkit.Chem import PandasTools

from kinfraglib import utils
from kinfraglib import filters

ImportError: cannot import name 'building_blocks' from partially initialized module 'kinfraglib.filters' (most likely due to a circular import) (/home/nona/masterthesis/KinFragLib/kinfraglib/filters/__init__.py)

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
# Needed to display ROMol images in DataFrames
PandasTools.RenderImagesInAllDataFrames(images=True)

### Define global Paths

In [None]:
# Path to data
HERE = Path().resolve()
PATH_DATA = HERE / "../../data"

## 1. Load fragment library

The fragment library is stored as a dictionary, with the individual subpockets as keys.

In [None]:
fragment_library_original = utils.read_fragment_library(PATH_DATA / "fragment_library")

In [None]:
fragment_library_original.keys()

In [None]:
pd.concat(fragment_library_original).reset_index(drop=True).shape

## 2. Apply pre-filters
Pre-filters include 
- removing fragments in pool X
- removing duplicates
- removing fragments without dummy atoms (unfragmented ligands)
- removing fragments only connecting to pool X

In [None]:
fragment_library = filters.prefilters.pre_filters(fragment_library_original)

In [None]:
fragment_library["AP"].head()

Count number of fragments in the original fragment library and after pre-filtering the fragment library.

In [None]:
num_fragments = pd.concat(
    [
        filters.analysis.count_fragments(fragment_library_original, "original"),
        filters.analysis.count_fragments(fragment_library, "pre_filtered"),
    ],
    axis=1,
)
num_fragments.append(num_fragments.sum().rename('Total'))

## 3. Filter for Rule of Three (Ro3)

The Rule of Three (Ro3) ([Drug Discovery Today, 2003, 8(19):876-877](https://www.sciencedirect.com/science/article/abs/pii/S1359644603028319?via%3Dihub)) is adapted from the Rule of Five (Ro5) ([
J Pharmacol Toxicol Methods, 2000, 44(1): 235-249](https://www.sciencedirect.com/science/article/abs/pii/S1056871900001076?via%3Dihub)) to check if small molecules make good lead compounds.
It is looking at the molecular properties, namely
- molecular weight (MW) <= 300
- number of hydrogen bond acceptor (HBA) <= 3
- number of hydrogen bond donor (HBD) <= 3
- number of rotatable bonds (NROT) <= 3
- polar surface area (PSA) <= 60

In [None]:
filters.drug_likeness.get_ro3_frags?

In [None]:
fragment_library = filters.drug_likeness.get_ro3_frags(fragment_library)

Inspect individual subpockets, including the new column if Ro3 fulfilled (`bool_ro3`). 

In [None]:
fragment_library["AP"].head()

Count number of pre-filtered fragments and number of fragments that are accepted and rejected by the Rule of Three filter.

In [None]:
num_fragments_ro3 = pd.concat(
    [
        filters.analysis.count_fragments(fragment_library, "pre_filtered"),
        filters.analysis.count_accepted_rejected(
            fragment_library, "bool_ro3", "ro3"
        ),
    ],
    axis=1,
)
num_fragments_ro3.append(num_fragments_ro3.sum().rename('Total'))

## 4. Filter for Quantitative Estimate of Druglikeness (QED)

Quantitative Estimate of Druglikeness (QED) ([Nat Chem. 2012 Jan 24; 4(2): 90–98](https://www.nature.com/articles/nchem.1243)) reflects the distribution of the molecular properties, namely

* molecular weight
* octanol-water-partition-coefficient
* number of hydrogen bond donor and acceptor, 
* polar surface area, 
* number of rotatable bonds, 
* number of aromatic rings 
* and number of structural alerts. 

For each property, a desirability function is used and with them the estimate is calculated.

In [None]:
filters.drug_likeness.get_qed?

In [None]:
fragment_library = filters.drug_likeness.get_qed(fragment_library, cutoff_val=0.492)

Inspect individual subpockets, including the new column if QED threshold fulfilled or not per fragment (`bool_qed`) and calculated QED values (`qed`). 

In [None]:
fragment_library["AP"].head()

Count number of pre-filtered fragments and number of fragments that are accepted and rejected by the QED filter.

In [None]:
num_fragments_qed = pd.concat(
    [
        filters.analysis.count_fragments(fragment_library, "pre_filtered"),
        filters.analysis.count_accepted_rejected(
            fragment_library, "bool_qed", "qed"
        ),
    ],
    axis=1,
)
num_fragments_qed.append(num_fragments_qed.sum().rename('Total'))

## 5. Analyze accepted/rejected fragments

    5.1. Count number of fragments that are accepted by the filter(s)

    5.2. Histogram of QED values

### 5.1. Count number of fragments that are accepted by the filter(s)

In [None]:
fragment_library = filters.analysis.number_of_accepted(
    fragment_library, columns=["bool_ro3", "bool_qed"], min_accepted=2
)

In [None]:
filters.analysis.accepted_num_filters(fragment_library, ["bool_qed", "bool_ro3"], filtername = "drug likeness filters", max_num_accepted = 2)

### 5.2. Histogram of QED values
Create a histogram for each subpocket showing the QED values and the chosen threshold.

In [None]:
filters.plots.make_hists(
    fragment_library,"qed", "QED", plot_stats=False, cutoff=0.42
)