# Loading `kissim` results

This is a short notebook showing how to load the `kissim` output files as Python objects.

- `fingerprint.json`: Fingerprints for all successfully encoded structures
- `fingerprint_clean.json`: Fingerprints dataset without outlier structures
- `feature_distances.csv.bz2`: Feature distances between all fingerprint pairs
- `fingerprint_distances.csv.bz2`: Fingerprint distances between all fingerprint pairs

In [1]:
from pathlib import Path

from kissim.encoding import FingerprintGenerator
from kissim.comparison import FeatureDistancesGenerator, FingerprintDistanceGenerator

from src.paths import PATH_RESULTS



In [2]:
HERE = Path(_dh[-1])  # noqa: F821
RESULTS = PATH_RESULTS / "all"
RESULTS

PosixPath('/home/dominique/Documents/GitHub/kissim_app/src/../results/all')

## Load fingerprints

### Without outlier filtering

In [3]:
%%time
fingerprints = FingerprintGenerator.from_json(RESULTS / "fingerprints.json")
len(fingerprints.data)

CPU times: user 1.1 s, sys: 87.7 ms, total: 1.18 s
Wall time: 1.18 s


4681

### With outlier filtering

In [4]:
%%time
fingerprints = FingerprintGenerator.from_json(RESULTS / "fingerprints_clean.json")
len(fingerprints.data)

CPU times: user 1.36 s, sys: 91.7 ms, total: 1.45 s
Wall time: 1.45 s


4681

## Load feature distances

In [5]:
%%time
feature_distances = FeatureDistancesGenerator.from_csv(RESULTS / "feature_distances.csv.bz2")
len(feature_distances.data)

CPU times: user 2min 18s, sys: 1.13 s, total: 2min 19s
Wall time: 2min 19s


10953540

## Load fingerprint distances

In [7]:
%%time
fingerprint_distances = FingerprintDistanceGenerator.from_csv(
    RESULTS / "fingerprint_distances.csv.bz2"
)
len(fingerprint_distances.data)

CPU times: user 20.9 s, sys: 11.9 ms, total: 20.9 s
Wall time: 20.9 s


10953540