# The `opencadd.structure.subpockets` module

Add introduction...

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd

from opencadd.structure.subpockets.core import Pocket, Subpocket, Region, AnchorResidue

## Example structural data as `DataFrame`

In [3]:
dataframe = pd.DataFrame(
    {"residue.pdb_id": ["1", "2", "3", "7", "8", "9", "11"],
     "atom.name": ["CA", "CA", "CA", "CA", "CA", "CA", "CA"],
     "atom.x": [1.0, 2.0, 3.0, 4.0, 20.0, 30.0, 40.0],
     "atom.y": [1.0, 2.0, 3.0, 4.0, 20.0, 30.0, 40.0],
     "atom.z": [1.0, 2.0, 3.0, 4.0, 20.0, 30.0, 40.0]
    }
)
dataframe

Unnamed: 0,residue.pdb_id,atom.name,atom.x,atom.y,atom.z
0,1,CA,1.0,1.0,1.0
1,2,CA,2.0,2.0,2.0
2,3,CA,3.0,3.0,3.0
3,7,CA,4.0,4.0,4.0
4,8,CA,20.0,20.0,20.0
5,9,CA,30.0,30.0,30.0
6,11,CA,40.0,40.0,40.0


## Pocket

The `Pocket` class currently holds the following attributes/properties:

- `data`: a `DataFrame` containing the structural data of the full protein or the pocket
- `name`: the protein/pocket name
- `subpockets`: subpockets defined based on a set of anchor residues each
- `regions`: user-defined regions that are of importance for the protein/pocket

### Initialize pocket

We initialize the pocket with its name and the pocket/protein structural data.

In [4]:
pocket = Pocket(dataframe, "example kinase")

We have not set any subpockets or regions (`None`), yet:

In [5]:
pocket.subpockets

In [6]:
pocket.regions

### Add subpockets

Next, we can add subpockets one-by-one to the pocket. For each subpocket we define the following:
- a subpocket __name__, 
- a subpocket __color__, 
- the __residue PDB IDs__ of all __anchor residues__, i.e. the residues determining the subpocket center (centroid of all anchor residues' CA atoms) and 
- optionally __residue labels__ for all __anchor residues__, e.g. if we want to pass an alignment ID to each residue.

The class method `add_subpocket` uses the `Subpocket` class.

In [7]:
pocket.add_subpocket("AP", "magenta", [1, 2, 3], ["a", "b", "c"])
pocket.add_subpocket("GA", "orange", [7, 8, 9], ["x", "y", "z"])
pocket.add_subpocket("na", "black", [100, 200])

Using the `Pocket`'s property `subpockets`, we get an overview of all specified subpockets.

In [8]:
pocket.subpockets

Unnamed: 0,subpocket.name,subpocket.color,subpocket.center
0,AP,magenta,"[2.0, 2.0, 2.0]"
1,GA,orange,"[18.0, 18.0, 18.0]"
2,na,black,


Using the `Pocket`'s property `anchor_residues`, we get an overview of all subpockets' anchor residues.

In [9]:
pocket.anchor_residues

Unnamed: 0,subpocket.name,subpocket.color,anchor_residue.pdb_id,anchor_residue.pdb_id_alternative,anchor_residue.label,anchor_residue.center
0,AP,magenta,1,,a,"[1.0, 1.0, 1.0]"
1,AP,magenta,2,,b,"[2.0, 2.0, 2.0]"
2,AP,magenta,3,,c,"[3.0, 3.0, 3.0]"
0,GA,orange,7,,x,"[4.0, 4.0, 4.0]"
1,GA,orange,8,,y,"[20.0, 20.0, 20.0]"
2,GA,orange,9,,z,"[30.0, 30.0, 30.0]"
0,na,black,100,,,
1,na,black,200,,,


### Add regions

The `Pocket` class also allows to specify pocket regions, normally used to store key regions, such as the hinge region or the catalytic loop in kinases. This information can be used for pocket visualization.

The class method `add_regions` uses the `Regions` class.

In [10]:
pocket.add_region("hinge region", "magenta", [1, 2], [81, 82])
pocket.add_region("catalytic loop", "yellow", [7, 8], [1, 2])

In [11]:
pocket.regions

Unnamed: 0,region.name,region.color,residue.pdb_ids,residue.label
0,hinge region,magenta,"[1, 2]","[81, 82]"
1,catalytic loop,yellow,"[7, 8]","[1, 2]"


## Subpocket

It is also possible to define a single subpocket, using the `Subpocket` class directly.

In [12]:
subpocket = Subpocket()
subpocket.from_dataframe(dataframe, "AP", "magenta", [1, 2, 3], ["a", "b", "c"])

Get the subpocket name and color.

In [13]:
print(subpocket.name)
print(subpocket.color)

AP
magenta


Get the subpocket center.

In [14]:
subpocket.center

array([2., 2., 2.])

Get details on all anchor residues used to calculate the subpocket center.

In [15]:
subpocket.anchor_residues

Unnamed: 0,subpocket.name,subpocket.color,anchor_residue.pdb_id,anchor_residue.pdb_id_alternative,anchor_residue.label,anchor_residue.center
0,AP,magenta,1,,a,"[1.0, 1.0, 1.0]"
1,AP,magenta,2,,b,"[2.0, 2.0, 2.0]"
2,AP,magenta,3,,c,"[3.0, 3.0, 3.0]"


## Region

It is also possible to define a single region, using the `Region` class directly.

In [16]:
region = Region()
region.from_dataframe(dataframe, "hinge region", "magenta", [7, 8, 9, 10], [1, 2, 3, 4])

Get the region's name and color.

In [17]:
print(region.name)
print(region.color)

hinge region
magenta


Get the region's residue PDB IDs and optionally residue labels (without input residue PDB IDs that are not part of the protein/pocket).

In [18]:
print(region.residue_pdb_ids)
print(region.residue_labels)

['7', '8', '9']
['1', '2', '3']


## Anchor residue

The `AnchorResidue` class holds

### Test behaviour for missing anchor residue

In [19]:
def test_anchor_residue_behaviour(dataframe, residue_pdb_id):
    residue = AnchorResidue()
    residue.from_dataframe(dataframe, residue_pdb_id)
    print("Input residue:       ", residue.pdb_id)
    print("Alternative residue: ", residue.pdb_id_alternative)
    print("Residue center:      ", residue.center)

The determination of anchor residues depends on the CA atom availablity of the user-defined anchor residue as well as the residue before and after.

#### Case 1: Anchor residue available

In [20]:
test_anchor_residue_behaviour(dataframe, "1")

Input residue:        1
Alternative residue:  None
Residue center:       [1. 1. 1.]


#### Case 2: Anchor residue not available, but residues before and after

In [21]:
test_anchor_residue_behaviour(dataframe, "10")

Input residue:        10
Alternative residue:  ['9', '11']
Residue center:       [35. 35. 35.]


#### Case 3: Anchor residue not available, but residues before (not after)

In [22]:
test_anchor_residue_behaviour(dataframe, "4")

Input residue:        4
Alternative residue:  ['3']
Residue center:       [3. 3. 3.]


#### Case 4: Anchor residue not available, but residues after (not before)

In [23]:
test_anchor_residue_behaviour(dataframe, "6")

Input residue:        6
Alternative residue:  ['7']
Residue center:       [4. 4. 4.]


#### Case 5: Anchor residue and residues before and after not available

In [24]:
test_anchor_residue_behaviour(dataframe, "5")

Input residue:        5
Alternative residue:  None
Residue center:       None
