# Getting Started: Program Config

In this tutorial, we will demonstrate how to run the subpocket-based docking pipeline in a small example. Specifically, we will:
1. configure and discuss the program paramters, 
2. execute the subpocket-based docking pipeline, and
3. inspect the resulting output files.

**Note:** In the notebook [getting_started.ipynb](getting_started.ipynb), we demonstrated, how to prepare the protein structure and specifically how generate the FlexX and Hydescorer configuration files. These files will be assumed to be given in this tutorial. 

In [91]:
# imports
import json
import pprint
import os

from pathlib import Path

In [92]:
# paths
HERE = Path(_dh[-1])
ROOT = HERE / ".."
PATH_TEMPLATE_SETTINGS = ROOT / "config" / "templates" / "settings.json"

## Program Configuration
All program specific parameters can be adjusted in JSON files is passed to the program later. To set-up such a file, we start with the template file [config/template/settings.json](../config/templates/settings.json):

In [93]:
# Read template config file
with open(PATH_TEMPLATE_SETTINGS, 'r') as file:
    config = json.load(file)

# For readability reasons, we use pprint instead of the native printing here
pprint.pp(config)

{'Name': 'TODO',
 'CoreSubpocket': 'TODO',
 'Subpockets': ['TODO', '...'],
 'KinFragLib': 'TODO',
 'Config': 'TODO',
 'FlexX': 'TODO',
 'Hyde': 'TODO',
 'NumberFragmentsPerIterations': 100,
 'NumberPosesPerFragment': 5,
 'Filters': {'pains': {},
             'brenk': {'path_data': 'KinFragLib/data/filters/Brenk'},
             'ro3': {},
             'qed': {'cutoff_val': 0.492},
             'syba': {'cutoff_val': 0}},
 'UseClusterBasedPoseFiltering': True,
 'DistanceThresholdClustering': 1.5,
 'NumberThreads': 1,
 'Seed': 42,
 'UseClusterBasedFragmentFiltering': True,
 'PSoftMin': 1,
 'UseHyde': True,
 'HydeDisplacementCutoff': 2.5}


Here, we see all program paramters that can/need to be adjusted. Some of the parameters are set to `TODO` implying that these need to be adjusted, and thus are required. The other options are not necessarily required, here they are set to their default values. In the following, we will first set the required parameters and then discuss the optional paramters.

### Required Programm Parameters
First, we set the project name (`Name`). The project name will be used by the program to infer the `.flexx` and `.hydescorer` configuartion files, as well as, to name the output folder. 

First, we define the paths to:


In [94]:
PROJECT_NAME = '5n1f_tut'
config['Name'] = PROJECT_NAME

Next, we define the core subpocket and subpocket path. As we want to start growing the ligands from the AP subpocket, into the FP, we define the core subpocket (`CoreSubpocket`) and subpocket path (`subpockets`) accordingly. 

**Note:** here, we are only growing into one subpoeckt - the FP subpocket -, however one can also grow into more subpockets by adding them to the subpocket list.

In [95]:
config['CoreSubpocket'] = 'AP'
config['Subpockets'] = ['FP']

Lastely, we need to define the paths to:
* the `Config` folder, where the 
* the `FlexX` and `Hyde` executable (can be Dowloaded from TODO),
* and the fragment library `KinFragLib` (TODO link),
* and, - since we apply here some custom kinfraglib filters, also the path to the Brenk collection

**Note:** These paths - especially `FlexX` and `Hyde` - may need to be adjusted depending where they have been placed and on the machine that is used. Further, one might need to adjust the path to the `brenk` collection. However, we won't apply this filter in this tutorial, thus adaptiing this is not needed herer.

**Note:** in this example, we use a highly reduced fragments library (TODO link) that was designed only for this tutorial. It only comprises xx fragments that are only assigned to the subpockets AP and FP.

In [96]:
# define paths
config['Config'] = '../config'
config['FlexX'] = '../flexx-6.3.1-Linux-x64/flexx'
config['Hyde'] = '../hydescorer-2.3.1-Linux-x64/hydescorer'
config['KinFragLib'] = '../KinFragLib/data/fragment_library_tiny'

# the brenk collection could be linked here if needed:
# config['Filters']['brenk']['path_data'] = '../KinFragLib/data/filters/Brenk'

pprint.pp(config)

{'Name': '5n1f_tut',
 'CoreSubpocket': 'AP',
 'Subpockets': ['FP'],
 'KinFragLib': '../KinFragLib/data/fragment_library_tiny',
 'Config': '../config',
 'FlexX': '../flexx-6.3.1-Linux-x64/flexx',
 'Hyde': '../hydescorer-2.3.1-Linux-x64/hydescorer',
 'NumberFragmentsPerIterations': 100,
 'NumberPosesPerFragment': 5,
 'Filters': {'pains': {},
             'brenk': {'path_data': 'KinFragLib/data/filters/Brenk'},
             'ro3': {},
             'qed': {'cutoff_val': 0.492},
             'syba': {'cutoff_val': 0}},
 'UseClusterBasedPoseFiltering': True,
 'DistanceThresholdClustering': 1.5,
 'NumberThreads': 1,
 'Seed': 42,
 'UseClusterBasedFragmentFiltering': True,
 'PSoftMin': 1,
 'UseHyde': True,
 'HydeDisplacementCutoff': 2.5}


Now, all required arguments are set. Let's briefly discuss the other paramters that can be adjusted.

## Optional Program Paramters
### Fragment Library Reduction
- `Filters` - defines Custom KinFragLib filters that are applied to the given fragment library, available filters are: `pains`, `brenk`, `ro3`, `qed`, `bb`, `syba` (for more information on the filters, refer to TODO).

In this tutorial, we will only apply the `ro3` filter with it's default parameters:

In [97]:
config['Filters'] = {'ro3' : {}}

### HYDE scoring and optimization
* `UseHyde`- if `True`, HYDE (cite) is performed after FlexX docking. This is performed by default.
* `HydeDisplacementCutoff` - sometimes, HYDE marginally displaces the docking poses which can move the fragments outside their respective subpockets and might imply an unfavourbale docking poses. Poses that derivate more than this cutoff, are discarded.  

### Candiate Filtering (per Subpocket Iteration)
- `NumberFragmentsPerIterations` - specifies the number of candidates - i.e. fragments or fragments combinations - that is selected in each subpocket iteration.
- `UseClusterBasedFragmentFiltering` - if `True`, a cluster based stragety is used to select a more diveres (regarding the tanimoto similarity of molecular fingerprints) set of promising candidates. Otherwise the `NumberFragmentsPerIterations` best scoring candidates are choosen.
- `PSoftMin` - variable that is used within the cluster-based selection strategy (`UseClusterBasedFragmentFiltering`). Informally, this number let's use adjust, whether we focus more on the diversity or the score during the candidate selection. A higher value, leads to a more randomised selection of clusters and thus a higher diversity. Thus, it becomes more likely that compounds with a ubfavaourable score are selected the higher the variable.

As a toy example, we will select only 10 candiates per subpocket iteration and employ the cluster-based selection strategy with default parameters:

In [98]:
config['NumberFragmentsPerIterations'] = 10

### Pose Selection (per Subpocket Iteration)
- `NumberPosesPerFragment` - defines the number of docking poses that is selected **per** candiadates and that are then used as template conformation for the growing process.
- `UseClusterBasedPoseFiltering` - if `True`, a cluster based stragety is used to select a more diveres (regarding the RMSD of the atom positions) set of high scoring docking poses. Otherwise the `NumberPosesPerFragment` best scoring poses are choosen.
- `DistanceThresholdClustering` - defines the maximum distance that poses can have in one cluster. Since only one pose is selected per cluster, a high value would lead that more dissimilar compounds are considered as similar. If `DistanceThresholdClustering` is chosen very close to 0, its similar to simply selecting the `NumberPosesPerFragment` best scoring poses.

Here, we will select only 2 candiates per subpocket iteration and employ the cluster-based selection strategy with default parameters:

In [99]:
config['NumberPosesPerFragment'] = 2

### Threads and seed
- `Seed` - seed to use
- `Threads` - number of threads

In [100]:
config['Threads'] = 16

## Saving the Settings File
As a last step before running the subpocket-based docking pipeline, we will save the config as a JSON such tha we can pass it to the pipeline next:

In [101]:
# paths
CONFIG_FOLDER = ROOT / "config" / PROJECT_NAME
PATH_TUTORIAL_SETTINGS = CONFIG_FOLDER / "settings.json"

In [102]:
with open(PATH_TUTORIAL_SETTINGS, 'w') as file:
    json.dump(config, file)

## Run the Pipeline
With the `-h` option, we can see the help page, inspecting all available command line arguments:

In [103]:
pipeline_prefix = 'python ' + str(ROOT / 'src/fragment_docking.py')

In [104]:
_r = os.system(pipeline_prefix + ' -h')

usage: /home/katharina/KinFragLib_PocketEnum/notebooks/../src/fragment_docking.py
       [-h] [-s SETTINGS] [-r RESULTS] [-log LOGLEVEL]

Generates compounds for a given kinase

optional arguments:
  -h, --help            show this help message and exit
  -s SETTINGS, --settings SETTINGS
                        JSON file with program configuration
  -r RESULTS, --results RESULTS
                        Folder, where results are placed
  -log LOGLEVEL, --loglevel LOGLEVEL
                        Example --loglevel debug, default=info


We need to define a folder where all resulting files are placed. Thus, if not already present, we create such an `result` folder:

In [105]:
RESULTS_FOLDER = ROOT / "results"

In [106]:
if not os.path.exists(RESULTS_FOLDER):
    os.makedirs(RESULTS_FOLDER)

Let's finally run the subpocket-based docking pipeline on the prepared PKA target:

In [107]:
# full command
cmd = pipeline_prefix + ' -s ' + str(PATH_TUTORIAL_SETTINGS) + ' -r ' + str(RESULTS_FOLDER)
cmd

'python /home/katharina/KinFragLib_PocketEnum/notebooks/../src/fragment_docking.py -s /home/katharina/KinFragLib_PocketEnum/notebooks/../config/5n1f_tut/settings.json -r /home/katharina/KinFragLib_PocketEnum/notebooks/../results'

In [108]:
# this might take a while (~5 minutes)
_r = os.system(cmd)

wandb: Currently logged in as: kabu00002 (kinase_pocket_enum) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.11
wandb: Run data is saved locally in /home/katharina/KinFragLib_PocketEnum/notebooks/wandb/run-20250618_152020-fevkqax1
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run 5n1f_tut
wandb: ‚≠êÔ∏è View project at https://wandb.ai/kinase_pocket_enum/subpocket_based_docking_kinases
wandb: üöÄ View run at https://wandb.ai/kinase_pocket_enum/subpocket_based_docking_kinases/runs/fevkqax1
2025-06-18 15:20:21 - INFO - Preprocessing started
2025-06-18 15:20:21 - INFO - Preprocessing finished
2025-06-18 15:20:21 - INFO - Size of fragment library{'AP': 6, 'FP': 5, 'GA': 8}
2025-06-18 15:20:21 - INFO - Core docking of 6 AP-Fragments
                                2.553852345966775
                                2.6123588775281243
                                2.6056338280413334
                          

[1;34mwandb[0m: 
[1;34mwandb[0m: üöÄ View run [33m5n1f_tut[0m at: [34mhttps://wandb.ai/kinase_pocket_enum/subpocket_based_docking_kinases/runs/fevkqax1[0m
[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20250618_152020-fevkqax1/logs[0m


**Note:** since we use a **highly** reduced fragment library, this *only* takes about 5 minutes. However, on a larger library - depending on the number of fragments to choose per iteration - this would take hours to few days. Thus, running the pipeline on a cluster migth be a good idea.