# FFF Workshop

## A5: Algorithmic fragment merging

### Outline

- Identifying merging opportunities
- Small scale Fragmenstein merge via API
- Small scale Knitwork
- Point to larger scale stuff in day 2
- Squonk jobs

## Identifying merging opportunities

For fragment progression it is often desirable to restrict the observations considered for merging. When working with Fragalysis data, tags and Canonical Sites assigned by XCA can be useful starting points

In [None]:
# setup the animal
import hippo
animal = hippo.HIPPO(
    "A71EV2A_demo",
    "../data/A71EV2A.sqlite",
)

For the example target A71EV2A it makes most sense to explore merges of fragments in the active site:

In [None]:
active_site_fragments = animal.poses(tag="[Other] Active site fragment")
active_site_fragments

In [None]:
active_site_hits = animal.poses(tag="hits").get_by_subsite(id=1)
active_site_hits

You may also want to explore merges of other/larger ligands in the active site:

One way to anticipate pairs of molecules that are suitable for merging is to look at their interactions. An ideal merge will combine molecules that share some but not all interactions with each other. Consider the following pair:

In [None]:
animal.poses["A0207a", "A0237a"].draw()

The overlapping ring might make it suitable for merging. Looking at the interactions:

In [None]:
animal.poses["A0207a"].interactions.summary()
animal.poses["A0237a"].interactions.summary()

There are 2 shared interactions, which could be exploited during merging.

In HIPPO a `PoseSet` method exists to count these opportunities using interactions in the dataset:

In [None]:
animal.poses["A0207a", "A0237a"].get_interaction_overlaps()

This tells us that `A0207a` and `A0237a` might be worth trying to merge.

For a larger set you can count all the interaction overlaps as well:

In [None]:
active_site_fragments.get_interaction_overlaps()

You can also get the pairs as a list of `PoseSet` objects:

In [None]:
pairs = active_site_fragments.get_interaction_overlaps(return_pairs=True)

for pair in pairs:
    print(pair.names)
    break

## Submitting Jobs to Squonk

Squonk is the job and data management service used by Fragalysis to submit jobs, such as this notebook app!

Squonk provides a Data Manager (DM) user interace: https://data-manager-ui.xchem.diamond.ac.uk/data-manager-ui

The DM can be used to submit and monitor jobs, as well as see outputs via it's interface. 

In this notebook though, we'll cover how to programmatically do the same. Jobs can be submitted using a Fragalysis token, but to monitor them, a squonk token is needed.

### Getting a Squonk access token

1. Go to the Squonk Data Manager API (Swagger) page: https://data-manager.xchem.diamond.ac.uk/data-manager-api/api/
2. Click `Authorize`
3. Enter `data-manager-api-production` as the `client-id`
4. Click Authorize
5. Log in via Diamond CAS
6. Scroll down to https://data-manager.xchem.diamond.ac.uk/data-manager-api/api/#/user/app.api_user.get_api_token
7. Click `Try it out` and then `Execute`
8. Copy the token value from the `Response body` and paste it in the cell below:

In [None]:
squonk_token = "eyJhbG ... 7-FbAQ"

Also do the same for your Fragalysis token:

In [None]:
token = "kjqi8v9c ... 6gud2x"

## Fragmenstein Merging

Fragmenstein can be used to generate merges of multiple molecules, and minimise their conformations relative to the reference/inspiration molecules.

A Fragmenstein merging job is available in Fragalysis/Squonk and can be requested with the fragalysis Python API. 

N.B. This does require all referenced observations and the protein conformation to already be present in Fragalysis.

See also: [Fragmenstein paper](https://doi.org/10.1186/s13321-025-00946-0)

In [None]:
from fragalysis.requests import fragmenstein_combine
from fragalysis.requests.squonk import monitor_jobs, get_file

In [None]:
fragmenstein_combine(
    observations=["A0207a", "A0237a"], 
    protein="A0207a",
    target_name="A71EV2A", 
    tas="lb32627-66", 
    stack="staging", 
    token=token,
)

The returned link can be used to monitor the job and retrieve the outputs via the Squonk Data Manager UI.

All jobs submitted to Squonk can also be viewed as a live table:

In [None]:
monitor_jobs(
    stack="staging",
    token=squonk_token,
)

To get the output file from the job use:

In [None]:
get_file(
    instance="instance-8ecb4d8f-71e1-43f6-b975-7c7f5dba0d7d",
    path="fragalysis-files/hnge/merged.sdf",
    destination="merged.sdf",
    token=squonk_token,
    stack="staging",
)

## Fragment Knitwork merging

An alternative method to fragment merging using a graph database called *Fragment Knitwork* can also be accessed via Squonk.

See also: [Fragment Knitwork paper](https://doi.org/10.1021/acs.jcim.3c00276)

In [None]:
from fragalysis.requests import knitwork
from fragalysis.requests.squonk import list_files, monitor_jobs

In [None]:
knitwork(
    observations=["A0237a", "A0207a"],
    target_name="A71EV2A",
    tas="lb32627-66",
    stack="staging",
    token=token,
)

Similarly, monitor the job and fetch it's results:

In [None]:
monitor_jobs(
    stack="staging",
    token=squonk_token,
)

In [None]:
list_files(
    instance="instance-b4232c91-f5a5-418b-853a-943b7624ac9d",
    root="fragalysis-files/gtap",
    token=squonk_token,
)

## Loading results back into HIPPO

Once you have an SD file from a merging algorithm or other source, you can load it back into HIPPO using the `animal.load_sdf` method:

In [None]:
animal.load_sdf(
    target="A71EV2A",
    path="../data/openbind_a71ev2a_c1_scaffolds.sdf",
    compound_tags=["openbind_a71ev2a_c1_scaffolds"],
    pose_tags=["openbind_a71ev2a_c1_scaffolds", "fragmenstein_placed"],
)

N.B. that the example SDF 'openbind_a71ev2a_c1_scaffolds.sdf' was produced by HIPPO's `PoseSet.to_fragalysis` and thus the format matches HIPPO's default parsing options. For other SDFs it may be necessary to add extra parameters. See the load_sdf [docs](https://hippo-docs.winokan.com/en/latest/animal.html#hippo.animal.HIPPO.load_sdf)

## Exploring Inspiration/Derivative Relationships

These compounds and poses are now in the database and annotated accordingly.

In [None]:
scaffolds = animal.compounds(tag="openbind_a71ev2a_c1_scaffolds")
scaffold_poses = animal.poses(tag="openbind_a71ev2a_c1_scaffolds")
display(scaffolds)
display(scaffold_poses)

Poses that are designed with reference to other poses will have `inspiration` and `derivative` relationships:

In [None]:
print("scaffold:", scaffold_poses[0], "was inspired by:", scaffold_poses[0].inspirations.names)
scaffold_poses[0].draw()

It may be interesting to explore inspiration statistics across a set of poses. In this case the [PoseSet.split_by_inspirations](https://hippo-docs.winokan.com/en/latest/poses.html#hippo.pset.PoseSet.split_by_inspirations) method which clusters poses based on their inspiration sets may be useful:

In [None]:
inspiration_map = scaffold_poses.split_by_inspirations()

# viewed as a dataframe:
import pandas as pd
df = pd.DataFrame(dict(inspirations=inspiration_map.keys(), derivatives=inspiration_map.values()))
df

Else the inspiration statistics by fragment can be calculated with PoseSet methods we have seen before:

In [None]:
# get all inspirations of the scaffolds:
inspirations = scaffold_poses.inspirations
print("inspirations:",inspirations)

# for progress bar
import mrich

# loop over inspiration poses:
data = []
for inspiration in mrich.track(inspirations):

    # get just the scaffold derivatives
    scaffold_derivatives = inspiration.derivatives & scaffold_poses
    
    # add to data for plotting later
    data.append(dict(
        inspiration_name = inspiration.name,
        inspiration = inspiration,
        derivatives = scaffold_derivatives,
        num_derivatives = len(scaffold_derivatives),
    ))

df = pd.DataFrame(data)

You can then plot a bar chart:

In [None]:
import molparse as mp
import plotly.express as px

# sort the dataframe
df = df.sort_values(by="num_derivatives", ascending=False)

# plot only the top 20
fig = px.bar(df.iloc[:20], x="inspiration_name", y="num_derivatives")

# save the figure
mp.write("top20_inspiration_hits.pdf", fig) # for documents/publications, etc
mp.write("top20_inspiration_hits.pickle", fig) # for opening later in python
mp.write("top20_inspiration_hits.html", fig) # for opening interactively in a browser

fig

Or a tanimoto similarity scatter plot coloured by inspiration sets:

In [None]:
from hippo.plotting import plot_compound_tsnee
df = scaffold_poses.get_df(mol=True, inspiration_aliases=True)
df = df.reset_index()
fig = plot_compound_tsnee(
    title=scaffold_poses.name,
    animal=animal, 
    compounds=None, 
    df=df, 
    cluster_by="inspiration_aliases",
)
fig