# FFF Workshop

## B1: Working with Enamine quotes and Syndirella Routes

### Outline

- Quoting compounds from Enamine manually
- Retrosynthesis with syndirella
- Exploring reactions and routes
- Recipes
- Loading Quotes from a reference Database
- Generating a quoted Recipe

In [None]:
import hippo
animal = hippo.HIPPO(
    "A71EV2A_demo",
    "../data/A71EV2A.sqlite",
)

## Quoting compounds from Enamine manually

Quoting compounds from Enamine still remains a manual process in the FFF pipeline as we find that the availability and pricing information is more accurate than any API call. However, later in this notebook we'll show a way to get information on Enamine's in-stock data.

The first step when quoting manually is exporting a CSV that can be sent to Enamine with specific plating/solvent instructions. E.g. for the `openbind_a71ev2a_c1_scaffolds` molecules we looked at before:

In [None]:
scaffolds = animal.compounds(tag="openbind_a71ev2a_c1_scaffolds")
scaffolds.write_smiles_csv("openbind_a71ev2a_c1_scaffolds_smiles.csv", tags=False)

From Enamine you might receive a file such as `../data/Q2097917_EUR_a71ev2a_c1.xlsx`. Take a look at it's formatting:

In [None]:
import pandas as pd
pd.read_excel("../data/Q2097917_EUR_a71ev2a_c1.xlsx").head()

This data can be loaded into [Quote](https://hippo-docs.winokan.com/en/latest/quoting.html#hippo.quote.Quote) objects using the [add_enamine_quote](https://hippo-docs.winokan.com/en/latest/animal.html#hippo.animal.HIPPO.add_enamine_quote) method:

In [None]:
quoted = animal.add_enamine_quote(
    "../data/Q2097917_EUR_a71ev2a_c1.xlsx",
    orig_name_col = "Customer ID",
    
)

Now, let's explore how these quotes appear in the database. [add_enamine_quote](https://hippo-docs.winokan.com/en/latest/animal.html#hippo.animal.HIPPO.add_enamine_quote) returns an [IngredientSet](https://hippo-docs.winokan.com/en/latest/compounds.html#hippo.cset.IngredientSet). An [Ingredient](https://hippo-docs.winokan.com/en/latest/compounds.html#hippo.compound.Ingredient) is a specific quantity of a [Compound](https://hippo-docs.winokan.com/en/latest/compounds.html#hippo.compound.Compound), and since our quotes above were for a specific quantity that's what is returned:

In [None]:
quoted

Under the hood, the IngredientSet has a DataFrame:

In [None]:
quoted.df

The associated Quote objects allow for [Price](https://hippo-docs.winokan.com/en/latest/quoting.html#hippo.price.Price) calculations of any IngredientSet:

In [None]:
quoted.price

You can also get the quotes from a compound:

In [None]:
animal.C1284.get_quotes()

or

In [None]:
animal.C1284.get_quotes(df=True)

Let's also tag the quoted scaffolds so we can easily retrieve them later

In [None]:
quoted.compounds.add_tag("openbind_a71ev2a_c1_scaffolds_quoted")

## Retrosynthesis with syndirella

[Syndirella](https://github.com/kate-fie/syndirella) is XChem's tool of choice for retrosynthesis and elaboration. Syndirella provides a CLI to solve retrosynthesis routes using different engines, including AIZynthFinder, manually suggested routes, or Postera's Manifold.

We won't cover running syndirella in this notebook but HIPPO can provide syndirella inputs using the [PoseSet.to_syndirella](https://hippo-docs.winokan.com/en/latest/poses.html#hippo.pset.PoseSet.to_syndirella) method. For example, for a few of our scaffolds:

In [None]:
scaffolds[:10].best_placed_poses.to_syndirella("first_ten_scaffolds")

This has created the following outputs:

- A syndirella input CSV: `first_ten_scaffolds_syndirella_input.csv`
- Reference PDB structures in `templates` (not needed for retrosynthesis)
- An SDF of reference ligands (inspirations): `first_ten_scaffolds_syndirella_inspiration_hits.sdf`. (Not needed for retrosynthesis)

A syndirella retrosynthesis call might look like:

`run --input first_ten_scaffolds_syndirella_inspiration_hits.sdf --output retrosynthesis --just_retro`

The result can be read into HIPPO with [add_syndirella_routes](https://hippo-docs.winokan.com/en/latest/animal.html#hippo.animal.HIPPO.add_syndirella_routes):

In [None]:
animal.add_syndirella_routes(
    "../data/justretroquery_openbind_a71ev2a_c1_scaffolds_syndirella_input.pkl.gz"
)

This has added [Reaction](https://hippo-docs.winokan.com/en/latest/reactions.html#hippo.reaction.Reaction) and [Route](https://hippo-docs.winokan.com/en/latest/recipes.html#hippo.recipe.Route) entries we can now explore:

## Exploring reactions and routes

[Reactions](https://hippo-docs.winokan.com/en/latest/reactions.html#hippo.reaction.Reaction) in HIPPO are conceptually simplified. They essentially just describe a transformation between one or more reactants into a product, via a named reaction *type*.

E.g.:

In [None]:
display(animal.R1)
animal.R1.draw()

In this case, **C893** is one of our scaffold products, so there is only one Reaction in its [Route](https://hippo-docs.winokan.com/en/latest/recipes.html#hippo.recipe.Route).

The Route in this case is a single reaction step, with no intermediates, and just two products. In theory though a Route could have many steps including deprotections and involve a whole network of reaction steps. Additionally, there may be multiple routes to a given product.

Let's look at the Route object to **C893**:

In [None]:
route = animal.db.get_route(id=1)
route

There are also a few graphical representations available in HIPPO:

In [None]:
route.draw()

In [None]:
route.sankey()

## Recipes

The next important concept in HIPPO is that of a [Recipe](https://hippo-docs.winokan.com/en/latest/recipes.html#hippo.recipe.Recipe), which is a more generalised form of Routes, where all the reactants, intermediates, and reactions needed to synthesise a whole set of compounds are described.

We can try to generate a Recipe to our first five scaffolds, ignoring quotes for now:

In [None]:
recipe = hippo.Recipe.from_compounds(scaffolds[:5], quoted_only=False)
recipe

Because syndirella didn't find routes to all of our scaffolds our Recipe only has two products:

In [None]:
recipe.summary()

In [None]:
recipe.sankey()

Now let's look at how we can estimate the starting material cost for our recipe. First we need some more quotes:

## Loading Quotes from a reference Database

For Enamine's in-stock catalogues, a reference HIPPO database has been created from large SDFs provided via an FTP server to XChem (see https://github.com/xchem/EnamineCatalogs). It's too large to keep in this GitHub repo, it can be obtained from the Slack channel or for Diamond users from [here](https://dlsltd-my.sharepoint.com/:u:/g/personal/max_winokan_diamond_ac_uk/EWMS4aAG18NHhogH0mMUCh4BUgdp2Z989XcDkHtkdF-VRQ?e=Hxp5Jw). Upload it to the data directory and then unzip it:

In [None]:
!unzip ../data/enamine_bb_hippo_filtered.sqlite.zip -d ../data/

Then quotes from the reference database can be loaded into our A71EV2A project with [quote_compounds](https://hippo-docs.winokan.com/en/latest/animal.html#hippo.animal.HIPPO.quote_compounds):

In [None]:
ref_db = hippo.HIPPO("Enamine In-Stock", "../data/enamine_bb_filtered.sqlite", update_legacy=True)
animal.quote_reactants(ref_db)

Now these quotes are in the database, for example:

In [None]:
animal.C4060.get_quotes(df=True)

One extra thing to note for now is that by default `add_enamine_quote` will delete existing quotes for the same compounds, as they will be assumed to be the most up to date.

## Generating a quoted Recipe

Now that we have reactant quotes we should regenerate the recipe and only allow reactions where all reactants are available:

In [None]:
recipe = hippo.Recipe.from_compounds(scaffolds[:5], quoted_only=True)
recipe

Sadly there's now only one product available, but let's see what the whole scaffold set looks like:

In [None]:
recipe = hippo.Recipe.from_compounds(scaffolds, quoted_only=True)
recipe

Again there is very high attrition, it may be worth getting a manual quote of the full set of reactants, but for now let's work with this smaller recipe.

To get a CSV of reactants to review and send off for re-quoting:

In [None]:
recipe.write_reactant_csv("c1_scaffolds_in_stock_reactants.csv")

Also a CAR-compatible CSV can be created which encodes all the reaction chemistry:

In [None]:
recipe.write_CAR_csv("c1_scaffolds_in_stock_CAR.csv")

It's also good practice to save a JSON of the Recipe, as this can be used to recreate the HIPPO Recipe object if you want to save the recalculation time at a later stage:

In [None]:
recipe.write_json("c1_scaffolds_in_stock.json")

It can then be read back in as follows:

In [None]:
recipe = hippo.Recipe.from_json(animal.db, "c1_scaffolds_in_stock.json")

Even though this Recipe only has 31 products it would still cost almost â‚¬7k to order the building blocks:

In [None]:
recipe.price

If we look at the reactant IngredientSet:

In [None]:
recipe.reactants.df.head()

You can see that the required amount (in mg) is often in the single digits, while the quoted amount is 50 or 100 mg. This is often where the pricing discrepancy comes from, and getting accurate pricing for your synthesis scale is important.

In fact, it may be cheaper to order the products directly from Enamine:

In [None]:
synthesisable = recipe.products.compounds
iset = synthesisable.as_ingredientset(amount=1, supplier="Enamine")
iset.price

While some aren't available the cost per compound is much better.

But can we do any better? In the next notebook we'll look at generating and scoring some compound selections and recipes