# Combinatorial library vs. KLIFS

## Aim of this notebook

In this notebook we want to compare the combinatorial library to the original KLIFS ligands, i.e. the ligands from which the fragment library originates from.

## Table of contents

1. Combinatorial library data
2. Exact matches
3. Substructure matches

## 1. Combinatorial library data

The combinatorial library comes as large JSON file. In order to access/filter ligands efficiently (time and memory), we use the `ijson` library:

> Ijson is an iterative JSON parser with standard Python iterator interfaces.

https://pypi.org/project/ijson/

For a detailed description on the kind of information linked to the combinatorial library, please take a look at the notebook `notebooks/4_1_combinatorial_library_properties.ipynb`.

## 2. Exact matches

Get recombined ligands in combinatorial library that have exact matches in the KLIFS original ligands.

In [12]:
def original_ligands_exact_matches(path_json):

    # get object generator from json
    f = open(path_json, 'rb')
    objects = ijson.items(f, 'item')

    # filter objects
    objects_filtered = (o for o in objects if o['original_exact'] != [])
    objects_filtered_df = pd.DataFrame(objects_filtered)
    
    # sanity check: exact matches must also show up as substructures matches
    if any(objects_filtered_df.original_substructure.apply(len) == 0):
        raise ValueError(f'One or more ligands have an exact match but no substructure.')
    
    print(f'Number of exact matches in original ligands: {objects_filtered_df.shape[0]}')
    
    return objects_filtered_df

In [13]:
original_ligands_exact_matches(PATH_COMBINATORIAL_LIBRARY)

Number of exact matches in original ligands: 4


Unnamed: 0,bond_ids,fragment_ids,hba,hbd,mwt,logp,n_atoms,chembl_exact,chembl_most_similar,original_exact,original_substructure,inchi
0,"[[FP_7, GA_6], [SE_14, FP_8], [SE_13, AP_7]]","[FP_1, SE_2, GA_1, AP_2]",1,1,1,1,33,[],"[CHEMBL500021, 0.34]","[1, 8]","[1, 8]",InChI=1S/C23H28N8OS/c1-15-13-20(29-28-15)25-19...
1,"[[SE_13, AP_16], [FP_6, AP_17]]","[AP_4, SE_4, FP_2]",1,1,1,1,35,[],"[CHEMBL265282, 0.32]",[6],[6],InChI=1S/C26H28N6O3/c1-30-12-14-31(15-13-30)20...
2,"[[FP_7, GA_6], [SE_14, FP_8], [SE_13, AP_7]]","[FP_1, SE_2, GA_1, AP_2]",1,1,1,1,33,[],"[CHEMBL500021, 0.34]","[1, 8]","[1, 8]",InChI=1S/C23H28N8OS/c1-15-13-20(29-28-15)25-19...
3,"[[SE_13, AP_16], [FP_6, AP_17]]","[AP_4, SE_4, FP_2]",1,1,1,1,35,[],"[CHEMBL265282, 0.32]",[6],[6],InChI=1S/C26H28N6O3/c1-30-12-14-31(15-13-30)20...


## 3. Substructure matches

Get recombined ligands in combinatorial library that have substructure matches in the KLIFS original ligands.

In [14]:
def original_ligands_substructure_matches(path_json):

    # get object generator from json
    f = open(path_json, 'rb')
    objects = ijson.items(f, 'item')

    # filter objects
    objects_filtered = (o for o in objects if o['original_substructure'] != [])
    objects_filtered_df = pd.DataFrame(objects_filtered)
    
    print(f'Number of ligands with substructures in original ligands: {objects_filtered_df.shape[0]}')
    
    return objects_filtered_df

In [15]:
original_ligands_substructure_matches(PATH_COMBINATORIAL_LIBRARY)

Number of ligands with substructures in original ligands: 6


Unnamed: 0,bond_ids,fragment_ids,hba,hbd,mwt,logp,n_atoms,chembl_exact,chembl_most_similar,original_exact,original_substructure,inchi
0,"[[SE_16, AP_9]]","[SE_3, AP_3]",1,1,1,1,24,[],"[CHEMBL405398, 0.33]",[],[9],InChI=1S/C15H20N6O2S/c1-11-17-13(19-15-16-9-12...
1,"[[FP_7, GA_6], [SE_14, FP_8], [SE_13, AP_7]]","[FP_1, SE_2, GA_1, AP_2]",1,1,1,1,33,[],"[CHEMBL500021, 0.34]","[1, 8]","[1, 8]",InChI=1S/C23H28N8OS/c1-15-13-20(29-28-15)25-19...
2,"[[SE_13, AP_16], [FP_6, AP_17]]","[AP_4, SE_4, FP_2]",1,1,1,1,35,[],"[CHEMBL265282, 0.32]",[6],[6],InChI=1S/C26H28N6O3/c1-30-12-14-31(15-13-30)20...
3,"[[SE_16, AP_9]]","[SE_3, AP_3]",1,1,1,1,24,[],"[CHEMBL405398, 0.33]",[],[9],InChI=1S/C15H20N6O2S/c1-11-17-13(19-15-16-9-12...
4,"[[FP_7, GA_6], [SE_14, FP_8], [SE_13, AP_7]]","[FP_1, SE_2, GA_1, AP_2]",1,1,1,1,33,[],"[CHEMBL500021, 0.34]","[1, 8]","[1, 8]",InChI=1S/C23H28N8OS/c1-15-13-20(29-28-15)25-19...
5,"[[SE_13, AP_16], [FP_6, AP_17]]","[AP_4, SE_4, FP_2]",1,1,1,1,35,[],"[CHEMBL265282, 0.32]",[6],[6],InChI=1S/C26H28N6O3/c1-30-12-14-31(15-13-30)20...
