# TigerGraph Data Science Library 101 - Classification Algorithm
This notebook shows the examples of using the most common classification algorithms in TigerGraph Graph Science Library. More detailed explanations of these algorithms can be found in the official documentation (https://docs.tigergraph.com/graph-ml/current/classification-algorithms/).


## Step1: Setting things up
- Connect and Load data
- Visualize the graph schema 
- Get basic stats, e.g., counts of nodes & edges

### Create connection

In [1]:
import json
import pandas as pd
from pyTigerGraph import TigerGraphConnection

# Read in DB configs
with open('../config.json', "r") as config_file:
    config = json.load(config_file)

conn = TigerGraphConnection(
    host=config["host"],
    username=config["username"],
    password=config["password"],
)

### Download social dataset

In [2]:
from pyTigerGraph.datasets import Datasets

dataset_social = Datasets("social")

Downloading:   0%|          | 0/1970 [00:00<?, ?it/s]

### Ingest data

In [3]:
conn.ingestDataset(dataset_social, getToken=config["getToken"])

---- Checking database ----
---- Creating graph ----
The graph social is created.
---- Creating schema ----
Using graph 'social'
Successfully created schema change jobs: [social_schema].
Kick off schema change job social_schema
Doing schema change on graph 'social' (current version: 0)
Trying to add local vertex 'Person' to the graph 'social'.
Trying to add local edge 'Friend' and its reverse edge 'reverse_Friend' to the graph 'social'.
Trying to add local edge 'Coworker' to the graph 'social'.

Graph social updated to new version 1
The job social_schema completes in 1.400 seconds!
---- Creating loading job ----
Using graph 'social'
Successfully created loading jobs: [load_social].
---- Ingesting data ----
Ingested 17 objects into VERTEX Person
Ingested 15 objects into VERTEX Person
Ingested 14 objects into EDGE Friend
Ingested 13 objects into EDGE Coworker
---- Cleaning ----
---- Finished ingestion ----


### Visualize schema

In [4]:
from pyTigerGraph.visualization import drawSchema

drawSchema(conn.getSchema(force=True))

CytoscapeWidget(cytoscape_layout={'name': 'circle', 'animate': True, 'padding': 1}, cytoscape_style=[{'selecto…

### Print graph stats

In [5]:
vertices = conn.getVertexTypes()
total_count = 0
for vertex in vertices:
    vertex_cnt = conn.getVertexCount(vertex)
    total_count += vertex_cnt
    print("Node count: ({} : {}) ".format(vertex, vertex_cnt))
print("Total node count: ", total_count)

Node count: (Person : 12) 
Total node count:  12


In [6]:
import pprint
edge_count = conn.getEdgeCount()
print("Edges count: total ", sum(edge_count.values()))
pprint.pprint(edge_count) 

Edges count: total  39
{'Coworker': 11, 'Friend': 14, 'reverse_Friend': 14}


## Step 2: Leveraging pyTigerGraph’s featurizer to run Classification algorithms

pyTigerGraph provides a full suit of data science capabilities, and in this tutorial, we will showcase how to use featurizer to list out all available Classification algorithms in our GDS library, and to run a few popular algorithms as an example.

In [7]:
feat = conn.gds.featurizer()

In [8]:
feat.listAlgorithms("Classification")

Available algorithms for Classification:
  greedy_graph_coloring:
    01. name: tg_greedy_graph_coloring
  maximal_independent_set:
    deterministic:
      02. name: tg_maximal_indep_set
    random:
      03. name: tg_maximal_indep_set_random
Call runAlgorithm() with the algorithm name to execute it


## tg_maximal_indep_set

An independent set of vertices does not contain any pair of vertices that are neighbors, i.e., ones which have an edge between them. A maximal independent set (MIS) is the largest independent set that contains those vertices; you cannot improve upon it unless you start over with a different independent set. However, the search for the largest possible independent set is an NP-hard problem: there is no known algorithm that can find that answer in polynomial time. So we settle for the maximal independent set.

This algorithm finds use in applications wanting to find the most efficient configuration which "covers" all the necessary cases. For example, it has been used to optimize delivery or transit routes, where each vertex is one transit segment and each edge connects two segments that can not be covered by the same vehicle.

Since there could be multiple maximal independent sets, there are two versions of the Maximal Independent Set algorithm:

Deterministic. The deterministic version makes sure that you get the same results every time. (https://docs.tigergraph.com/graph-ml/current/classification-algorithms/maximal-independent-set)

## Input Parameters

* STRING v_type: Name of vertex type to use
* STRING e_type: Name of edge type to use
* INT maximum_iteration: maximum number of iterations for the search
* BOOL print_results: If True, output JSON to standard output
* STRING file_path: If not empty, write output to this file.

In [9]:
params = {
    "v_type": "Person",
    "e_type": "Coworker",
    "max_iter": 100,
    "print_accum": True,
    "file_path": ""
}

results = feat.runAlgorithm("tg_maximal_indep_set", params=params)

Installing and optimizing the queries, it might take a minute...
Queries installed successfully


## Results

A set of vertices that form a maximal independent set.

In [10]:
df_maximal_indep_set = pd.json_normalize(results, record_path =['Start'])
display(df_maximal_indep_set)

Unnamed: 0,v_id,v_type,attributes.name,attributes.score,attributes.tag,attributes.flag,attributes.@and_active,attributes.@or_selected,attributes.@min_vid
0,Eddie,Person,Eddie,0,,False,False,True,347078656
1,Ivy,Person,Ivy,0,,False,False,True,369098752
2,Justin,Person,Justin,0,,False,False,True,373293056
3,dirTarget,Person,dirTarget,0,,False,False,True,9223372036854775807
4,source,Person,source,0,,False,False,True,9223372036854775807


## tg_greedy_graph_coloring
This algorithm assigns a unique integer value known as its color to the vertices of a graph such that no neighboring vertices share the same color. The reason why this is called color is that this task is equivalent to assigning a color to each nation on a map so that no neighboring nations share the same color. (https://docs.tigergraph.com/graph-ml/current/classification-algorithms/greedy-graph-coloring)

## Input Parameters

* SET<STRING> v_type_set: A set of all vertex types to color.
* SET<STRING> e_type_set: A set of all edge types to traverse.
* UINT max_colors: The Maximum number of colors that can be used. Use a large number like 999999 unless there is a strict limit.
* BOOL print_color_count: If set to true, the total number of colors used will be displayed
* BOOL print_stats: If set to true, the output will display all vertices and their associated color
* STRING file_path: If a file path is provided, the output will be saved to the file indicated by the file path in CSV format.

In [11]:
params = {
    "v_type": ["Person"],
    "e_type": ["Friend", "Coworker"],
    "max_colors": 999999,
    "print_color_count": True,
    "display": True,
    "file_path": ""
}

results = feat.runAlgorithm("tg_greedy_graph_coloring", params=params)

Installing and optimizing the queries, it might take a minute...
Queries installed successfully


## Results

On the social graph, we want to color the Person vertices and any two vertices are either connected by a Friend edge or a Coworker edge do not have the same color. By running the greedy_graph_color algorithm, we get the following result:

In [12]:
r = json.dumps(results, indent = 1)
print (r)

[
 {
  "color_count": 4
 },
 {
  "start": [
   {
    "v_id": "Eddie",
    "v_type": "Person",
    "attributes": {
     "start.@sum_color_vertex": 3
    }
   },
   {
    "v_id": "Ivy",
    "v_type": "Person",
    "attributes": {
     "start.@sum_color_vertex": 4
    }
   },
   {
    "v_id": "Justin",
    "v_type": "Person",
    "attributes": {
     "start.@sum_color_vertex": 4
    }
   },
   {
    "v_id": "Damon",
    "v_type": "Person",
    "attributes": {
     "start.@sum_color_vertex": 2
    }
   },
   {
    "v_id": "Fiona",
    "v_type": "Person",
    "attributes": {
     "start.@sum_color_vertex": 3
    }
   },
   {
    "v_id": "Bob",
    "v_type": "Person",
    "attributes": {
     "start.@sum_color_vertex": 2
    }
   },
   {
    "v_id": "Chase",
    "v_type": "Person",
    "attributes": {
     "start.@sum_color_vertex": 1
    }
   },
   {
    "v_id": "Alex",
    "v_type": "Person",
    "attributes": {
     "start.@sum_color_vertex": 1
    }
   },
   {
    "v_id": "George",
    "