# FFF Workshop

## A3: Project setup and HIPPO introduction

### Outline

- Download a target from Fragalysis
- Create a HIPPO project and database
- Load Fragalysis data
- Explore the data

## Download a target from Fragalysis

Use the cell below to use the Fragalysis IPython Widgets to select a target for download. For private targets paste in an authentication token obtained from the Fragalysis frontend.

In [None]:
from fragalysis.widgets import download
download(destination="../data")

## Create a HIPPO project and database

Creating a `HIPPO` *animal* object will automatically create an empty sqlite database if it doesn't already exist.

In [None]:
%load_ext autoreload
%autoreload 2
import hippo
animal = hippo.HIPPO(
    "A71EV2A_demo",
    "../data/A71EV2A.sqlite",
)

## Load the Fragalysis poses

To load the downloaded Fragalysis poses into HIPPO use the `HIPPO.add_hits()` method:

In [None]:
animal.add_hits(
    target_name="A71EV2A",
    metadata_csv="../data/A71EV2A/metadata.csv",
    aligned_directory="../data/A71EV2A/aligned_files",
    load_pose_mols=True, # this is optional but saves time later
)

## Load SoakDB compounds

Fragalysis downloads often come with SoakDB files, and these provide a record of any soaking experiment against the target. To see a list of available files run the following cell:

In [None]:
from pathlib import Path
[str(s) for s in Path("../data/A71EV2A").glob("extra_files_?/soakdb_*-*.csv")]

To load this data into HIPPO use the `HIPPO.add_soakdb_compounds()` method:

In [None]:
animal.add_soakdb_compounds("../data/A71EV2A/extra_files_1/soakdb_lb32627-66.csv")

## Explore the data

The HIPPO database now contains a few thousand compounds, hundreds of poses which are annotated with tags.

In [None]:
animal.summary()

Tags are the most convenient way to group compounds and poses in HIPPO. 

In the list above, `[Other] ...` and `[Series] ...` tags are taken directly from the Fragalysis metadata CSV. `hits` and `soaks` were added during the `add_hits` and `add_soakdb_compounds` processes.

Tags can be used to make selections:

In [None]:
soaks = animal.compounds(tag="soaks")
soaks

This is a `CompoundSet` object, which is a subset of the full `CompoundTable`. More on this in the next notebook/session. But you can look at it's contents in a few ways:

### Interactive widget

In [None]:
soaks.interactive()

### Draw a compound grid

(this is for the first 12 compounds)

In [None]:
soaks[:6].draw()

Compounds have a shorthand name with the prefix `C`, e.g. `C3` is compound with ID 3 in the database. 

C3 can be accessed directly from the animal object in a couple ways:

In [None]:
# get using shorthand
c3 = animal.C3

# get using ID
c3 = animal.compounds[3]

# get using alias (assigned from SoakDB)
c3 = animal.compounds["ASAP-0032121-001"]

display(c3) # gives you a formatted string representation
c3.draw() # gives you a 2D RDKit drawing

The `Compound` class gives quick access to properties and other representations:

In [None]:
# Alias (optional)
print(c3.alias)

# SMILE string
print(c3.smiles)

# InCHI-Key
print(c3.inchikey)

# molecular weight
print(c3.molecular_weight)

# RDKit molecule
c3.mol

`Compound` objects also have a metadata dictionary stored in the database. This can be freely used to store any JSON serialisable data in the database:

In [None]:
c3.metadata

In [None]:
c3.metadata["test_string"] = "Max thinks this compound is cool"
c3.metadata

The full `Compound` API reference can be found [here](https://hippo-docs.winokan.com/en/latest/compounds.html#hippo.compound.Compound)

### Poses

`Pose` objects are 3D protein-ligand conformers associated with a `Compound` object. During `add_hits` all the Fragalysis observations were loaded as separate HIPPO poses:

In [None]:
observations = animal.poses(tag="hits")
observations

This is a `PoseSet` object, which is a subset of the full `PoseTable`. Again, more on this in the next notebook/session. But you can look at it's contents in similar ways to `CompoundSet` objects:

In [None]:
# interactive widget
observations.interactive()

In [None]:
# render conformations of the first 3 poses
observations[:3].draw()

Poses have a shorthand name with the prefix `P`, e.g. `P3` is the pose with ID 3 in the database. 

P12 can be accessed directly from the animal object in a couple ways:

In [None]:
# get using shorthand
p3 = animal.P3

# get using ID
p3 = animal.poses[3]

# get using alias (assigned from Fragalysis observation short code)
p3 = animal.poses["A4343a"]

display(p3) # gives you a formatted string representation
p3.draw() # gives you a Py3DMol render of the ligand
p3.render() # gives you a Py3DMol render of the protein and ligand

The `Pose` class gives quick access to properties and other representations:

In [None]:
# Alias (optional)
print(p3.alias)

# SMILE string
print(p3.smiles)

# InCHI-Key
print(p3.inchikey)

# metadata
print(p3.metadata)

# RDKit molecule
p3.mol

The full `Pose` API reference can be found [here](https://hippo-docs.winokan.com/en/latest/poses.html#hippo.pose.Pose)