# Finding potential fragment merges with the Fragment Network

This notebook illustrates how to use the Fragment Network to find purchasable molecules that
combine parts of two fragments.

For information about the Fragment Network see:
* [doi:10.1021/acs.jmedchem.7b00809](https://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00809)
* https://fragnet.informaticsmatters.com/

The assumption is that you have 2 fragment screening hits that are partly overlapping in space.
You want to identify purchasable molecules that might be *best of both worlds* in forming interactions
from both fragments.

We use the term "synthon" to mean a fragment of a molecule in the context of the Fragment Network.
We use this term rather than fragment to avoid confusion with the use of that name in fragment screening. 

The approach is to:
* Identify the synthons of fragment B that you want to graft onto fragment A 
* Start from fragment A and optionally remove one synthon (mols1)
* For those molecules optionally add one synthon of any type (mols2)
* For each synthon from fragment B add it to each of mols2 (mols3)

Where we state remove a synthon we mean find molecules in the Fragment Network that is a direct child of the source molecule.
Where we state add a synthon we mean find molecules in the Fragment Network that is a direct parent of the source molecule.

The result is a set of molecules derived from fragment A, possibly missing one synthon, having one synthon from fragment B added and possibly having one extra synthon of any type added.

One a small subset of these will be able to take up favourable conformations that are compatible with forming the relevent interactions formed by fragments A and B. This has to be checked using subsequent 3D techniques. 

## Usage

To use this you must have a Fragment Network Neo4j database running on your machine.

Typically do this with one of our [test containers](https://github.com/InformaticsMatters/docker-fragnet-test)
of by using `kubect port-forward ...` to access a real database running in K8S. 

In [6]:
from neo4j import GraphDatabase

In [7]:
# prompt for the database password
import getpass
try:
    password
except NameError:
    password = getpass.getpass()

In [8]:
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", password))

In [9]:
# simple example of a finding a molecule using its SMILES
def find_molecule_node(tx, smiles):
    for record in tx.run('MATCH (m:F2 {smiles: $smiles}) RETURN m', smiles=smiles):
        node = record['m']
        return node
    
with driver.session() as session:
    mol = session.read_transaction(find_molecule_node, 'CN1CCN(C(=O)Cc2c[nH]c3ncccc23)CC1')
    print(mol)

<Node id=176921298 labels=frozenset({'CanSmi', 'Mol', 'F2', 'V_REAL'}) properties={'osmiles': 'CC1CCC(C(O)CC2CCC3CCCCC23)CC1', 'chac': 15, 'hac': 19, 'smiles': 'CN1CCN(C(=O)Cc2c[nH]c3ncccc23)CC1', 'cmpd_ids': ['REAL:Z1400780201']}>


## Parameters

In [10]:
# until we have proper test data we use the same molecule for both fragments
fragment_a = 'CN1CCN(C(=O)Cc2c[nH]c3ncccc23)CC1'
fragment_b = 'CN1CCN(C(=O)Cc2c[nH]c3ncccc23)CC1'

## Child synthons
Methods to find all child synthons of a molecule. The edges have a 'label' property that looks like this:

`RING|[Xe]c1nncs1|[103Xe]C1CCCC1|RING|O=C(Cc1c[nH]c2ncccc12)N1CCN(C[Xe])CC1|OC(CC1CCC2CCCCC12)C1CCC(C[103Xe])CC1`

This has six tokens.
The second token describes what was added/removed.
The fith token describes what it was added/removed to/from.
Both use a Xe atom to mark the attachment site.
Hence those two tokens can be considered potential *synthons*.
They have to be filtered to remove molecules with 2 components and those with 2 attachements sites as those are
not able to make simple adducts.

In [11]:
def add_required_synthons(labels, synthon):
    """Only add synthons with a single attachment point and a single component"""
    if '.' not in synthon and synthon.count('[Xe]') == 1:
        labels.add(synthon)

def find_synthons(tx, smiles):
    """Query for all child fragments (recursive).
    Extract the label property of each edge and collect a set of SMILES that match our needs.
    """
    labels = set()
    for record in tx.run('MATCH (fa:F2 {smiles: $smiles})-[e:FRAG*]->(f:F2) RETURN e', smiles=smiles):
        edges = record['e']
        for edge in edges:
            s = edge['label']
            tokens = s.split('|')
            add_required_synthons(labels, tokens[1])
            add_required_synthons(labels, tokens[4])
    return labels

In [12]:
# Generate synthons or fragment B
with driver.session() as session:
    synthons = session.read_transaction(find_synthons, fragment_b)
    print('Found', len(synthons), 'synthons')
    for s in synthons:
        print(s)

Found 11 synthons
[Xe]c1c[nH]c2ncccc12
CN1CCN([Xe])CC1
O=C(C[Xe])N1CCNCC1
O=C([Xe])Cc1c[nH]c2ncccc12
[Xe]N1CCNCC1
O=C(Cc1c[nH]c2ncccc12)N1CCN([Xe])CC1
CC(=O)N1CCN([Xe])CC1
CC(=O)[Xe]
CN1CCN(C(=O)C[Xe])CC1
O=CC[Xe]
C[Xe]


## Expand fragment A

In [13]:
def find_expansions(tx, smiles, synthon):
    """Expand the molecules with this SMILES using this synthon"""
    expansions = set()
    for record in tx.run("MATCH (fa:F2 {smiles: $smiles})-[e1:FRAG*0..1]->(i1:F2)"
                         "<-[e2:FRAG*0..1]-(i2)<-[e3:FRAG]-(c:Mol) "
                         "WHERE split(e3.label, '|')[1] = $synthon RETURN c", 
                         smiles=smiles, synthon=synthon):
        node = record['c']
        expansions.add(node['smiles'])
    return expansions

In [14]:
# do the expansions

with driver.session() as session:
    count = 0
    expanded_synthons = 0
    # for each synthon
    for synthon in synthons:
        # do the expansion of fragment_a with that synthon
        expansions = session.read_transaction(find_expansions, fragment_a, synthon)
        print('Found', len(expansions), 'expansions for', synthon)
        count += len(expansions)
        if (len(expansions)):
            expanded_synthons += 1
        for e in expansions:
            print(e)
    print(count, 'total expansions from', expanded_synthons, 'out of', len(synthons), 'synthons')

Found 23 expansions for [Xe]c1c[nH]c2ncccc12
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2ncccn2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(CC2CCOCC2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(CC2CCC2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2ccccn2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2csnn2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2ccccc2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2cscn2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2nncs2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2nccs2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2nc(=O)[nH][nH]2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(CC2CCCC2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(CC2CC2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2cnns2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2ccns2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2cccnc2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(CC2CCCCO2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(CC2CCCOC2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2ccsc2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(CC2CCOC2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2ccon2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2cncs2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(Cc2ccco2)CC1
O=C(Cc1c[nH]c2ncccc12)N1CCN(CC2CCCCC2)C

In [15]:
driver.close()