# Graph classification

**Authors:** Olympio Hacquard and Vadim Lebovici

In this notebook, we show how to compute Euler characteristic descriptors on graph datasets.

-----

**Preliminary**: adding the right folder to path.

In [1]:
import sys
sys.path.append('../')

# Computing multi-parameter filtrations

To build sublevel sets filtrations of graphs, we consider the heat-kernel signature, the Ollivier-Ricci and Forman-Ricci curvatures, centrality, and edge betweenness on connected graphs. In addition, some datasets (`PROTEINS`, `COX2`, `DHFR`) come with functions defined on the graph nodes. These functions can be computed as it is done with the method `_extract_intrinsic_funcs_DHFR()` from `eulearning.datasets`.

The available filtrations can be computed using the respective keywords: `hks_time` for the heat kernel signature, `ricci_alpha_iterations` for the Ollivier-Ricci curvature, `forman` for the Forman-Ricci curvature, `centrality` for the centrality function, `betweenness` for the edge betweenness and `func_ind` for the ind-th function pre-defined on the graphs of this specific dataset. For instance, one can choose `hks_1.0`, `hks_10.0`, `ricci_0.5_0`, `betweenness`, and `func_0`, as it is done below:

In [2]:
chosen_filtrations = ['hks_1.0', 'hks_10.0', 'ricci_0.5_0', 'centrality', 'func_0']
n_params = len(chosen_filtrations)

We load the dataset `DHFR` and compute its vectorized simplex trees associated to the chosen filtrations. The other datasets `MUTAG`, `COX2`, `PROTEINS`, `NCI1`,`IMDB-BINARY` and `IMDB-MULTI` are available on the [Perslay repository](https://github.com/MathieuCarriere/perslay).

In [3]:
from eulearning.datasets import load_graph_dataset

dataset = 'DHFR' 
path_to_dataset = '../data/' + dataset + '/'

vec_sts, y = load_graph_dataset(dataset, path_to_dataset, chosen_filtrations)

# Computing Euler characteristic descriptors
We compute the Euler characteristic profiles of the above multi-filtrations, as well as their Radon transform and hybrid transforms.

### Euler characteristic profiles

In [4]:
from eulearning.descriptors import EulerCharacteristicProfile

# ECPs are flatten by default to fit with sklearn classifiers. Set flatten=False to unflatten them.
euler_profile = EulerCharacteristicProfile(resolution=tuple(5 for _ in range(n_params)), quantiles=[(0, 1) for _ in range(n_params)]) 
ecps = euler_profile.fit_transform(vec_sts)

### Radon transforms

In [5]:
from eulearning.descriptors import RadonTransform

radon_transform = RadonTransform(tuple(5 for _ in range(n_params)), quantiles=[0]*n_params)
rdns = radon_transform.fit_transform(vec_sts)

### Hybrid transforms

In [6]:
from eulearning.descriptors import HybridTransform

hybrid_transform = HybridTransform(tuple(5 for _ in range(n_params)), quantiles=[0]*n_params, kernel_name='exp_4')
hts = hybrid_transform.fit_transform(vec_sts)

# Classifying dataset

These three descriptors can then be used as feature vectors to perform a supervised classification task. We train a random forest on 90% of the dataset and the remaining 10% are used for validation. 

In [7]:
import numpy as np
from sklearn.model_selection 	import train_test_split
from sklearn.ensemble 		import RandomForestClassifier

clf = RandomForestClassifier(max_depth=10)

# Classifying using Euler characteristic profiles
ecps_train, ecps_test, y_train, y_test = train_test_split(ecps, y, test_size=0.1)
clf.fit(ecps_train, y_train)
ecps_score = clf.score(ecps_test, y_test)
print('ECPs score:', np.round(ecps_score*100, decimals=2), '%')

# Classifying using Radon transforms
rdns_train, rdns_test, y_train, y_test = train_test_split(rdns, y, test_size=0.1)
clf.fit(rdns_train, y_train)
rdns_score = clf.score(rdns_test, y_test)
print(' RTs score:', np.round(rdns_score*100, decimals=2), '%')

# Classifying using hybrid transforms
hts_train, hts_test, y_train, y_test = train_test_split(hts, y, test_size=0.1)
clf.fit(hts_train, y_train)
hts_score = clf.score(hts_test, y_test)
print(' HTs score:', np.round(hts_score*100, decimals=2), '%')

ECPs score: 72.37 %
 RTs score: 65.79 %
 HTs score: 76.32 %
