# Node classification in networks using NetworkX

In this section we illustrate how to use the NetworkX library to perform node classification in networks. The NetworkX library in Python is a powerful tool for working with complex networks and can be used to perform node classification.

To demonstrate, we will use the Airports dataset [1], which contains three networks `{'usa', 'brazil', 'europe'}` in which nodes represent airports and edges represent the existence of flights between airports. In addition, node labels correspond to airport activity levels `{0, 1, 2, 3}`.  Our task is then classifying the airport activity levels in the network.

[1] "struc2vec: Learning Node Representations from Structural Identity", Ribeiro et al., https://arxiv.org/abs/1704.03165


In [1]:
import os
import networkx as nx
import pandas as pd

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, precision_recall_fscore_support
from sklearn.model_selection import train_test_split
from networkx.algorithms import node_classification

## Load dataset

The Airport dataset is available online [1]. If you have run the preparation script `prep.sh`, the dataset should appear in `data/airports/`. Otherwise you can download it manually. We will use the `usa` network through this section.

[1] https://github.com/leoribeiro/struc2vec/tree/master/graph

In [2]:
# params
data_dir = 'data/airports/'
network = 'usa'

### Load edge info

In [3]:
edgelist = pd.read_csv(os.path.join(data_dir, f"{network}-airports.edgelist"), sep=' ', header=None, names=["source", "target"])
edgelist

Unnamed: 0,source,target
0,12343,12129
1,13277,11996
2,13796,13476
3,15061,14559
4,14314,12889
...,...,...
13594,13303,10747
13595,13029,12892
13596,13930,11618
13597,12278,11423


In [4]:
# We have to use an undirected graph here because the node classification algorithms in nx do not support directed graphs
G = nx.from_pandas_edgelist(edgelist)

### Load node labels

In [5]:
df_nodes = pd.read_csv(os.path.join(data_dir, f"labels-{network}-airports.txt"), sep=' ')
df_nodes

Unnamed: 0,node,label
0,10241,1
1,10243,2
2,10245,0
3,16390,1
4,10247,1
...,...,...
1185,12278,0
1186,12280,0
1187,14332,3
1188,10237,2


## Split training and test data

We split the node labels into two sets: training and test using function `train_test_split()` from the `sklearn` library.

The training set is used to train the node classifiers, thus it is visible to the model during the training phase. The test dataset is not visible to the model during the training phase, and will only be used to evaluate the classifiers.

Note in our setting, the whole network structure is visible to the model during the training phase.

In [6]:
# param: define the size of the training dataset
train_size = 0.8
df_train, df_test = train_test_split(df_nodes, train_size=train_size)

In [7]:
node_ids_train, node_ids_test = df_train.iloc[:, 0].values, df_test.iloc[:, 0].values
labels_train, labels_test = df_train.iloc[:, -1].values.astype(str), df_test.iloc[:, -1].values.astype(str)
data_train = dict(zip(node_ids_train, labels_train))

In [8]:
nx.set_node_attributes(G, data_train, name="label")

## Perform node classification

Now we perform node classification using the built-in Harmonic Function method in NetworkX.

* Zhu, X., Ghahramani, Z., & Lafferty, J. (2003, August). Semi-supervised learning using gaussian fields and harmonic functions. In ICML (Vol. 3, pp. 912-919).

In [9]:
result_hf = node_classification.harmonic_function(G)

The result contains the labels (predicted or ground truth) of all nodes. For example, the labels of the first five nodes:

In [10]:
# check part of the result
result_hf[:5]

['0', '1', '0', '0', '0']

Similarly, we can perform node classification using the built-in Local and Global Consistency method.

* Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. (2004). Learning with local and global consistency. Advances in neural information processing systems, 16(16), 321-328.

In [11]:
result_lgc = node_classification.local_and_global_consistency(G)

The result contains the labels (predicted or ground truth) of all nodes. For example, the labels of the first five nodes:

In [12]:
# check part of the result
result_lgc[:5]

['0', '0', '0', '0', '0']

## Evaluation

To evaluate the results of the classification, we fetch the labels of the nodes in the test set.

We then compare the redicted node labels with the ground truth, and calculate the following metrics on the result using the `sklearn` library.

* Accuracy: a general measure of how often the model is correct.
* Precision: focuses on the quality of positive predictions.
* Recall: focuses on capturing all actual positives.
* F1-Score: provides a balance between precision and recall.

Since we have a multi-class classification problem, we also calculate the average precision, average recall, and average F1-score for all classes. These metrics collectively provide a detailed view of the model’s performance, helping to understand its strengths and weaknesses in predicting the labels for the nodes in the test set.

### Evaluation of the "Harmonic Function" classifier

In [13]:
dict_result_hf = dict(zip(list(G), result_hf))
labels_pred_hf = [ dict_result_hf.get(id) for id in node_ids_test ]

In [14]:
print('Performance of the "Harmonic Function" classifier')

# calculate accuracy
accuracy = accuracy_score(labels_test, labels_pred_hf)

# calculate precision, recall, F1 score, and support
precision, recall, f1, support = precision_recall_fscore_support(labels_test, labels_pred_hf, average=None)
df_eval = pd.DataFrame({
    'Precision': precision,
    'Recall': recall,
    'F1 Score': f1,
    'Support': support
}, index=['Activity Level 0', 'Activity Level 1', 'Activity Level 2', 'Activity Level 3'])

# calculate average precision, average recall, and average F1 score
avg_precision = precision_score(labels_test, labels_pred_hf, average='macro')
avg_recall = recall_score(labels_test, labels_pred_hf, average='macro')
avg_f1 = f1_score(labels_test, labels_pred_hf, average='macro')

print(f'Accuracy: {accuracy:.4f}')

print(df_eval)

print(f'Average Precision: {avg_precision:.4f}')
print(f'Average Recall: {avg_recall:.4f}')
print(f'Average F1-score: {avg_f1:.4f}')

Performance of the "Harmonic Function" classifier
Accuracy: 0.3739
                  Precision    Recall  F1 Score  Support
Activity Level 0   0.299363  0.959184  0.456311       49
Activity Level 1   0.400000  0.431373  0.415094       51
Activity Level 2   0.714286  0.238095  0.357143       63
Activity Level 3   1.000000  0.066667  0.125000       75
Average Precision: 0.6034
Average Recall: 0.4238
Average F1-score: 0.3384


### Evaluation of the "Local and Global Consistency" classifier

In [15]:
dict_result_lgc = dict(zip(list(G), result_lgc))
labels_pred_lgc = [ dict_result_lgc.get(id) for id in node_ids_test ]

In [16]:
print('Performance of the "Local and Global Consistency" classifier')

# calculate accuracy
accuracy = accuracy_score(labels_test, labels_pred_lgc)

# calculate precision, recall, F1 score, and support
precision, recall, f1, support = precision_recall_fscore_support(labels_test, labels_pred_lgc, average=None)
df_eval = pd.DataFrame({
    'Precision': precision,
    'Recall': recall,
    'F1 Score': f1,
    'Support': support
}, index=['Activity Level 0', 'Activity Level 1', 'Activity Level 2', 'Activity Level 3'])

# calculate average precision, average recall, and average F1 score
avg_precision = precision_score(labels_test, labels_pred_lgc, average='macro')
avg_recall = recall_score(labels_test, labels_pred_lgc, average='macro')
avg_f1 = f1_score(labels_test, labels_pred_lgc, average='macro')

print(f'Accuracy: {accuracy:.4f}', )

print(df_eval)

print(f'Average Precision: {avg_precision:.4f}')
print(f'Average Recall: {avg_recall:.4f}')
print(f'Average F1-score: {avg_f1:.4f}')

Performance of the "Local and Global Consistency" classifier
Accuracy: 0.3403
                  Precision    Recall  F1 Score  Support
Activity Level 0   0.311258  0.959184  0.470000       49
Activity Level 1   0.328947  0.490196  0.393701       51
Activity Level 2   0.800000  0.126984  0.219178       63
Activity Level 3   1.000000  0.013333  0.026316       75
Average Precision: 0.6101
Average Recall: 0.3974
Average F1-score: 0.2773


To summarize, the two classifiers have overall low performance as indicated by the low average F1-score. They have higher precision for high activity-level airports, and higher recall for low activity-level airports. This indicates that the classifiers tend to underestimate the activity level.