# `giotto-tda` persistent homology tutorial

## 1. Tutorial: Vietoris-Rips API

We generate a dataset of circles, spheres and toris, 10 samples for each.

In [None]:
from data.generate_datasets import make_random_point_clouds
n_samples_per_class = 10
point_clouds, labels = make_random_point_clouds(n_samples_per_class, 200, 0)

In [None]:
from gtda.plotting import plot_point_cloud
plot_point_cloud(point_clouds[2*n_samples_per_class])

In [None]:
from gtda.homology import VietorisRipsPersistence

VR = VietorisRipsPersistence(homology_dimensions=[1, 2], n_jobs=8)

What is unique about the `giotto-tda` API, is that we calculate the persistence diagrams for lists of point clouds.

In [None]:
print(f"(n_point_clouds, n_points, dimension) = {point_clouds.shape}")

In [None]:
diagrams = VR.fit_transform(point_clouds)
diagrams.shape

In [None]:
diagrams[0]

Two observations:
1. Diagrams are padded with points on the diagonal,
2. By default, `reduced_homology==True`, so that the essential point from the diagram in dimension 0 is removed.

In [None]:
names = ["circle", "sphere", "torus"]
for ind, name in enumerate(names):
    fig = VR.plot(diagrams, sample=n_samples_per_class*ind,
                  plotly_params={"layout": {"title": f"Persistence diagrams of a sample from a {name}"}})
    fig.show()

The space of multi-sets in $\mathbb{R}^2$ lacks the mathematical properties often required for statistical inference. Using vectorisation or features of the diagrams, like "persistence entropy", places us in a favorable setting.

To calculate the persistence entropy, we view the distribution of persistence values and calculate its entropy. 

In [None]:
from gtda.diagrams import PersistenceEntropy
PE = PersistenceEntropy()

In [None]:
PE.fit_transform(diagrams[::n_samples_per_class])

Let's use a RandomForestClassifier on those features. Similarly to `scikit-learn`, we can compose Transformers using pipelines. In `giotto-tda`, there is a dedicated function `make_pipeline`.

In [None]:
from gtda.pipeline import make_pipeline

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

steps = [VietorisRipsPersistence(homology_dimensions=[1, 2]),
         PersistenceEntropy(),
         RandomForestClassifier()]

pipeline = make_pipeline(*steps)
pipeline

In [None]:
pcs_train, pcs_valid, labels_train, labels_valid = train_test_split(point_clouds, labels,
                                                                    random_state=0, shuffle=True)

In [None]:
pipeline.fit(pcs_train, labels_train)

pipeline.score(pcs_valid, labels_valid)

## More complexity ?

In [None]:
from gtda.diagrams import PersistenceImage

PI = PersistenceImage()

_ = PI.fit(diagrams)

In [None]:
persistence_images = PI.transform(diagrams[::n_samples_per_class])
for pi in persistence_images:
    fig = PI.plot([pi])
    fig.show()