# FFF Workshop

## A7: Scaffolds

### Outline

- Load an elaboration series
- Murcko Scaffolds
- Searching by similarity

## Load HIPPO

## Load an elaboration series

Load an example elaboration series around "ASAP-0031281-001":

In [None]:
import hippo
animal = hippo.HIPPO(
    "A71EV2A_demo",
    "../data/A71EV2A.sqlite",
)

In [None]:
animal.load_sdf(
    target="A71EV2A",
    path="../data/ASAP-0031281-001_elabs.sdf",
)

Loading SDF has inserted several compounds clustered around the experimental hit `A2846a`:

In [None]:
# the crystallographic hit pose
hit_pose = animal.poses["A2846a"]
hit_pose.draw()

# the associated 2d compound
scaffold = hit_pose.compound
scaffold.alias = "ASAP-0031281-001"
scaffold.draw()

The [Compound.elabs](https://hippo-docs.winokan.com/en/latest/compounds.html#hippo.compound.Compound.elabs) property can be used to get elaborations, once the scaffold/superstructure relationships have been defined. Unless the methods [HIPPO.add_syndirella_routes](https://hippo-docs.winokan.com/en/latest/animal.html#hippo.animal.HIPPO.add_syndirella_routes) or [HIPPO.add_syndirella_elabs](https://hippo-docs.winokan.com/en/latest/animal.html#hippo.animal.HIPPO.add_syndirella_elabs) was used to add the compounds these relations are not in the database yet:

In [None]:
elabs = scaffold.elabs
print(elabs)

Before we insert the relationships there is another way to look for superstructures without inserting strict entries into the database using [query_substructure](https://hippo-docs.winokan.com/en/latest/db.html#hippo.db.Database.query_substructure):

In [None]:
elabs = animal.db.query_substructure(scaffold.smiles)
print(elabs)

N.B. the above can also be used with SMARTS patterns, by setting `smarts=True`.

To compute all exact sub/superstructure relationships and add them to the database use [calculate_all_scaffolds](https://hippo-docs.winokan.com/en/latest/db.html#hippo.db.Database.calculate_all_scaffolds):

In [None]:
animal.db.calculate_all_scaffolds()

Now `scaffold.elabs` should give us the same result without needing to query the molecule binaries, which can be slow in a large database:

In [None]:
elabs = scaffold.elabs
print(elabs)

Now you can have a look at the elaborations, the 2D drawing should show you the changed atoms:

In [None]:
elabs.interactive()

Notice that there are other compounds which could be considered scaffolds, with our original being `C385`

## Murcko Scaffolds

In the FFF sense, scaffolds are any compound that we have generated syndirella elaborations around, however in general chemistry/cheminformatics the term is more general.

Another scaffold representation is the Murcko framework proposed by Bemis and Murcko. This method employs a more systematical way to dissect a molecule into four parts: ring systems, linkers, side chains, and the Murcko framework that is the union of ring systems and linkers in a molecule.

rdkit provides convenient ways to calculate Murcko scaffolds. E.g. starting with `C385`:

In [None]:
from rdkit.Chem import MolFromSmiles, MolToSmiles
from rdkit.Chem.Scaffolds.MurckoScaffold import (
    MurckoScaffoldSmiles,
    MakeScaffoldGeneric,
)

compound = animal.C385
compound.draw(scaffolds=False)

murcko_smiles = MurckoScaffoldSmiles(compound.smiles)
murcko_mol = MolFromSmiles(murcko_smiles)
murcko_mol

In this case only a methyl has been removed.

The scaffold can further be generalised:

In [None]:
generic_mol = MakeScaffoldGeneric(murcko_mol)
generic_mol

These can then be used for queries:

In [None]:
murcko_elabs = animal.db.query_substructure(MolToSmiles(murcko_mol))
murcko_elabs

Which is our original 72 elaborations plus `C385`:

In [None]:
(murcko_elabs - elabs).ids

If you are interested in Murcko scaffolds for your project the best thing to do is to calculate them for all the compounds in the database with [calculate_all_murcko_scaffolds](https://hippo-docs.winokan.com/en/latest/db.html#hippo.db.Database.calculate_all_murcko_scaffolds):

In [None]:
animal.db.calculate_all_murcko_scaffolds();

Now our generic Murcko scaffold should appear in the compound table:

In [None]:
generic = animal.compounds(smiles=MolToSmiles(generic_mol))
generic

Along with a bunch of elaborations:

In [None]:
generic.elabs

The bulk created scaffolds are also tagged:

In [None]:
animal.compounds(tag="MurckoScaffold")

In [None]:
animal.compounds(tag="GenericMurckoScaffold")

You might be interested in exploring how the fragments that soaked and bound for A71EV2A can be clustered by different ring systems (e.g. Murcko scaffolds). In theory all the tools have been presented in the notebooks so far...

## Searching by similarity

When exact substructure matches are not desirable it may be interesting to look for similarities. In cheminformatics this is often described by a tanimoto similarity of binary fingerprints. Fingerprints represent the molecular structure of a molecule as a binary number, but they are not quite as descriptive as SMILES or InChiKeys so there may be clashes.

In HIPPO you can query by pattern fingerprint similarity using [query_similarity](https://hippo-docs.winokan.com/en/latest/db.html#hippo.db.Database.query_similarity):

In [None]:
similar, similarities = animal.db.query_similarity(
    MolToSmiles(scaffold.mol),
    threshold=0.93,
    return_similarity=True,
)
for compound, similarity in zip(similar, similarities):
    print(compound.name, f"{similarity:.2f}")
    compound.draw(scaffolds=False)