# API - Quick start Python interface

Let's take a look at the `kissim` quick start API to encode a set of structures (from the [KLIFS](https://klifs.net/) database) and perform an all-against-all comparison.
  
![kissim API](../_static/kissim_docs_api.png)

In [1]:
from kissim.api import encode, compare



In [2]:
# Load path to test data
from kissim.dataset.test import PATH as PATH_TEST_DATA

## Encode structures into fingerprints

The `encode` function is a quick start API to generate fingerprints in bulk based on structure KLIFS IDs. 

Input parameters are:

- `structure_klifs_ids`: Structure KLIFS IDs.
- `fingerprints_json_filepath`: (Optionally) Path to output json file containing fingerprints.
- `local_klifs_download_path` : (Optionally) Set path local KLIFS download or - if not set - fetch data from KLIFS database.
- `n_cores`: (Optionally) Number of cores used to generate fingerprints.

The return object is of type `FingerprintGenerator`.

In [3]:
# flake8-noqa-cell
encode?

[0;31mSignature:[0m
[0mencode[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mstructure_klifs_ids[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfingerprints_filepath[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlocal_klifs_download_path[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mn_cores[0m[0;34m=[0m[0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Encode structures.

Parameters
----------
structure_klifs_ids : list of int
    Structure KLIFS IDs.
fingerprints_filepath : str or pathlib.Path
    Path to output json file. Default None.
local_klifs_download_path : str or None
    If path to local KLIFS download is given, set up local KLIFS session.
    If None is given, set up remote KLIFS session.
n_cores : int
    Number of cores used to generate fingerprints.

Returns
-------
kissim.encoding.FingerprintGenerator
    Fingerprints.
[0;31mFile:[0m      ~/Documents/GitHu

### Run `encode` function

In [4]:
structure_klifs_ids = [109, 118, 12347, 1641, 3833, 9122]
fingerprint_generator = encode(
    structure_klifs_ids=structure_klifs_ids,
    fingerprints_filepath=None,
    n_cores=2,
    local_klifs_download_path=PATH_TEST_DATA / "KLIFS_download",
)

### Inspect output: `FingerprintGenerator`

In [5]:
print(f"Number of structures (input): {len(structure_klifs_ids)}")
print(f"Number of fingerprints (output): {len(fingerprint_generator.data.keys())}")
fingerprint_generator

Number of structures (input): 6
Number of fingerprints (output): 6


<kissim.encoding.fingerprint_generator.FingerprintGenerator at 0x7f04c1976910>

In [6]:
fingerprint_generator.data

{109: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18d0970>,
 118: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18dd1c0>,
 12347: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18d0b80>,
 1641: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18d03a0>,
 3833: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18d0430>,
 9122: <kissim.encoding.fingerprint.Fingerprint at 0x7f04c18dd130>}

Find more information about the `FingerprintGenerator` object [here](https://kissim.readthedocs.io/en/latest/tutorials/encoding.html).

## Compare fingerprints

The `compare` function is a quick start API to perform a pairwise all-against-all (bulk) comparison for a set of fingerprints. 

Input parameters are:

- `fingerprint_generator`: Fingerprints.
- `output_path`: (Optionally) Path to output folder for distances json files.
- `feature_weights`: (Optionally) Feature weights used to calculate the final fingerprint distance.
- `n_cores`: (Optionally) Number of cores used to generate distances.

The return objects are of type `FeatureDistancesGenerator` and `FingerprintDistanceGenerator`.

In [7]:
# flake8-noqa-cell
compare?

[0;31mSignature:[0m
[0mcompare[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mfingerprint_generator[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moutput_path[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfeature_weights[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mn_cores[0m[0;34m=[0m[0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Compare fingerprints (pairwise).

Parameters
----------
fingerprint_generator : kissim.encoding.FingerprintGenerator
    Fingerprints for KLIFS dataset.
output_path : str
    Path to output folder.
feature_weights : None or list of float
    Feature weights of the following form:
    (i) None
        Default feature weights: All features equally distributed to 1/15
        (15 features in total).
    (ii) By feature (list of 15 floats):
        Features to be set in the following order: size, hbd, hba, charge, aromatic,
        aliphatic, sc

### Run `compare` function

In [8]:
feature_distances_generator, fingerprint_distance_generator = compare(
    fingerprint_generator=fingerprint_generator,
    output_path=None,
    feature_weights=None,
    n_cores=2,
)

Calculate pairwise fingerprint distance:   0%|          | 0/15 [00:00<?, ?it/s]

Calculate pairwise fingerprint coverage:   0%|          | 0/15 [00:00<?, ?it/s]

For final fingerprint distances, please refer to the `FingerprintDistanceGenerator` object.

### Inspect output: `FingerprintDistanceGenerator`

In [9]:
print(f"Number of fingerprints (input): {len(fingerprint_generator.data)}")
print(f"Number of pairwise comparisons (output): {len(fingerprint_distance_generator.data)}")
fingerprint_distance_generator

Number of fingerprints (input): 6
Number of pairwise comparisons (output): 15


<kissim.comparison.fingerprint_distance_generator.FingerprintDistanceGenerator at 0x7f04c1880310>

In [10]:
fingerprint_distance_generator.data

Unnamed: 0,structure.1,structure.2,kinase.1,kinase.2,distance,bit_coverage
0,109,118,ABL2,ABL2,0.074214,0.992
1,109,12347,ABL2,BRAF,0.259053,0.919333
2,109,1641,ABL2,CHK1,0.253045,0.990667
3,109,3833,ABL2,AAK1,0.277368,0.990667
4,109,9122,ABL2,ADCK3,0.358882,0.990667
5,118,12347,ABL2,BRAF,0.273133,0.918
6,118,1641,ABL2,CHK1,0.246844,0.989333
7,118,3833,ABL2,AAK1,0.282949,0.99
8,118,9122,ABL2,ADCK3,0.360833,0.989333
9,12347,1641,BRAF,CHK1,0.30333,0.918


Get the structure distance matrix.

In [11]:
fingerprint_distance_generator.structure_distance_matrix()

structure.2,109,118,1641,3833,9122,12347
structure.1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
109,0.0,0.074214,0.253045,0.277368,0.358882,0.259053
118,0.074214,0.0,0.246844,0.282949,0.360833,0.273133
1641,0.253045,0.246844,0.0,0.22959,0.347142,0.30333
3833,0.277368,0.282949,0.22959,0.0,0.303542,0.307277
9122,0.358882,0.360833,0.347142,0.303542,0.0,0.376875
12347,0.259053,0.273133,0.30333,0.307277,0.376875,0.0


Map structure pairs to kinase pairs (example: here use structure pair with minimum distance as representative for kinase pair).

In [12]:
fingerprint_distance_generator.kinase_distance_matrix(by="minimum")

kinase.2,AAK1,ABL2,ADCK3,BRAF,CHK1
kinase.1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AAK1,0.0,0.277368,0.303542,0.307277,0.22959
ABL2,0.277368,0.0,0.358882,0.259053,0.246844
ADCK3,0.303542,0.358882,0.0,0.376875,0.347142
BRAF,0.307277,0.259053,0.376875,0.0,0.30333
CHK1,0.22959,0.246844,0.347142,0.30333,0.0


### Inspect output: `FeatureDistancesGenerator`

In [13]:
print(f"Number of fingerprints (input): {len(fingerprint_generator.data.keys())}")
print(f"Number of pairwise comparisons (output): {len(feature_distances_generator.data)}")
feature_distances_generator

Number of fingerprints (input): 6
Number of pairwise comparisons (output): 15


<kissim.comparison.feature_distances_generator.FeatureDistancesGenerator at 0x7f0524591ee0>

In [14]:
feature_distances_generator.data

Unnamed: 0,structure.1,structure.2,kinase.1,kinase.2,distance.1,distance.2,distance.3,distance.4,distance.5,distance.6,...,bit_coverage.6,bit_coverage.7,bit_coverage.8,bit_coverage.9,bit_coverage.10,bit_coverage.11,bit_coverage.12,bit_coverage.13,bit_coverage.14,bit_coverage.15
0,109,118,ABL2,ABL2,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.88,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1,109,12347,ABL2,BRAF,0.410256,0.397436,0.333333,0.24359,0.141026,0.230769,...,0.92,0.67,0.92,0.92,0.92,0.92,0.92,1.0,1.0,1.0
2,109,1641,ABL2,CHK1,0.388235,0.352941,0.364706,0.247059,0.141176,0.223529,...,1.0,0.86,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
3,109,3833,ABL2,AAK1,0.505882,0.505882,0.411765,0.211765,0.082353,0.270588,...,1.0,0.86,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
4,109,9122,ABL2,ADCK3,0.623529,0.470588,0.435294,0.258824,0.235294,0.305882,...,1.0,0.88,0.98,1.0,1.0,1.0,1.0,1.0,1.0,1.0
5,118,12347,ABL2,BRAF,0.410256,0.397436,0.333333,0.24359,0.141026,0.230769,...,0.92,0.65,0.92,0.92,0.92,0.92,0.92,1.0,1.0,1.0
6,118,1641,ABL2,CHK1,0.388235,0.352941,0.364706,0.247059,0.141176,0.223529,...,1.0,0.84,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
7,118,3833,ABL2,AAK1,0.505882,0.505882,0.411765,0.211765,0.082353,0.270588,...,1.0,0.85,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
8,118,9122,ABL2,ADCK3,0.623529,0.470588,0.435294,0.258824,0.235294,0.305882,...,1.0,0.86,0.98,1.0,1.0,1.0,1.0,1.0,1.0,1.0
9,12347,1641,BRAF,CHK1,0.346154,0.423077,0.346154,0.24359,0.115385,0.217949,...,0.92,0.65,0.92,0.92,0.92,0.92,0.92,1.0,1.0,1.0
