# Community Detection with NetworKit 

In this notebook we will cover some community detection algorithms implemented in the `community` module of NetworKit. Community detection is concerned with identifying groups of nodes which are significantly more densely connected to each other than to the rest of the network. As a first step we import NetworKit:

In [18]:
import networkit as nk
nk.engineering.setNumberOfThreads(1)
nk.engineering.setSeed(0, False)

The `community` module provides a top-level function, [detectCommunities(G, algo=None, inspect=True)](https://networkit.github.io/dev-docs/python_api/community.html?highlight=detect#networkit.community.detectCommunities) to perform community detection of a given graph with a suitable algorithm, and print some statistics about the result. If no algorithm is specified via the `algo` parameter, community detection is performed using the [PLM](https://networkit.github.io/dev-docs/python_api/community.html?highlight=plm#networkit.community.PLM) algorithm.

This function can be used as follows:

In [19]:
# Read graph
G = nk.readGraph("/data/yliumh/AutoAtClusterDatasets/networkit/uk-2007-05@100000-edgelist.txt", nk.Format.EdgeListTabZero)
# G = nk.readGraph("/data/yliumh/AutoAtClusterDatasets/snap/com-orkut.ungraph.txt", nk.Format.SNAP)
# import networkx as nx
# from sklearn.metrics import normalized_mutual_info_score as NMI, adjusted_rand_score as ARI
# import scipy.sparse as sp
# import numpy as np
# # G = nx.karate_club_graph()
# # labels = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
# nc, r = 10, 1000 # ring of cliques: nc=#nodes in each clique; r=#cliques
# data = np.load(f"/home/yliumh/github/AutoAtCluster/baselines/Louvain/dataset/ring_of_cliques_{nc}_{r}.npz")
# adj_data, adj_row, adj_col = data["data"], data["row"], data["col"]
# adj = sp.coo_matrix((adj_data, (adj_row, adj_col)))
# G = nx.from_scipy_sparse_array(adj)
# G = nk.nxadapter.nx2nk(G)
# print("===")
# communities = nk.community.detectCommunities(G)
# preds = communities.getVector()
# print(NMI(labels, preds), ARI(labels, preds))

In [9]:
import numpy as np
import scipy.sparse as sp
import networkx as nx
def load_knn_graph(dataset, k, seed):
    try:
        knn_graph_path = f"/home/yliumh/github/AutoAtCluster/baselines/KNN/outputs/knn_adj_{dataset}_{seed}_{k}.npz"
        data = np.load(knn_graph_path)
        adj_data, adj_row, adj_col = data["data"], data["row"], data["col"]
        knn_adj = sp.coo_matrix((adj_data, (adj_row, adj_col)))
        graph = nx.from_scipy_sparse_array(knn_adj)
        return graph
    except Exception as e:
        print(e)
graph = load_knn_graph("amazon-photo", 100, 0)
print(graph)
G = nk.nxadapter.nx2nk(graph, weightAttr="weight")

Graph with 7650 nodes and 8785241 edges


In [8]:
nk.community.Modularity().getQuality(communities, G)

0.841363198808599

In [30]:
nk.graphtools.randomNeighbor(G,0)

1520

The following sections cover two popular community detection algorithms, `PLM` and `PLP`, and will illustrate how to use them.

## PLM

NetworKit provides a parallel implementation of the well-known Louvain method, which can be found in the [PLM](https://networkit.github.io/dev-docs/python_api/community.html?highlight=plm#networkit.community.PLM) class. It yields a high-quality solution at reasonably fast running times. The constructor `PLM(Graph, refine=False, gamma=0.1, par='balance', maxIter=32, turbo=True, recurse=True)` expects a [networkit.Graph](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=graph#networkit.Graph) as a mandatory parameter. If the parameter `refine` is set to true, the algorithm performs a second move phase to refine the communities. The parameter `gamma` defines the multi-resolution modularity parameter. The string `par` defines the openmp parallelization strategy. `maxIter` is the maximum number of iterations for move phase. When `turbo` is set to true, the algorithm is faster but uses O(n) additional memory per thread. Set `recurse`to true in order to use recursive coarsening. Refer to [this]( http://journals.aps.org/pre/abstract/10.1103/PhysRevE.89.049902) for more details on recursive coarsening.

In the example below we run PLM with `refine` set to true while leaving the rest of the parameters to their default values."

In [22]:
# Choose and initialize algorithm 
algo = nk.community.PLM(G, True, turbo=True, nm="ps", maxIter=32)
plmCommunities = nk.community.detectCommunities(G, algo=algo)
cnt = algo.getCount()
print(algo.getCount(), sum(cnt))
print(algo.getTiming())
print(algo.getIter())
preds = plmCommunities.getVector()
# print(NMI(labels, preds), ARI(labels, preds))

Communities detected in 0.40364 [s]
solution properties:
-------------------  --------------
# communities            61
min community size       21
max community size    18164
avg. community size    1639.34
imbalance                11.0756
edge cut             261893
edge cut (portion)        0.0936923
modularity                0.840457
-------------------  --------------
[1117340, 2330, 216, 61, 72, 932, 0] 1120951
{b'coarsen': [12, 0, 0], b'move': [380, 0, 0, 0], b'refine': [0, 0, 0]}
[33, 1]


The output of the `detectCommunities` function is a partition of the nodes of the graph. It is represented by the [Partition](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=partition#networkit.Partition) data structure, which provides several methods for inspecting and manipulating a partition of a set of elements.

In [4]:
print("{0} elements assigned to {1} subsets".format(plmCommunities.numberOfElements(),
                                                    plmCommunities.numberOfSubsets()))

34 elements assigned to 4 subsets


In [5]:
print("the biggest subset has size {0}".format(max(plmCommunities.subsetSizes())))

the biggest subset has size 12


The contents of a partition object can be written to file in a simple format, in which the `i`-th line contains an integer representing the subset id of node `i`.

In [None]:
nk.community.writeCommunities(plmCommunities, "output/communtiesPLM.partition")

In [16]:
# Choose and initialize algorithm 
algo = nk.community.ParallelLeiden(G, iterations=32)
plmCommunities = nk.community.detectCommunities(G, algo=algo)

Communities detected in 0.57400 [s]
solution properties:
-------------------  -------------
# communities           391
min community size        2
max community size    11443
avg. community size     255.754
imbalance                44.6992
edge cut             405444
edge cut (portion)        0.145048
modularity                0.804907
-------------------  -------------


## PLP 

The Label Propagation algorithm is an algorithm for finding communities in a graph. NetworKit provides a parallel implementation, [PLP(G, updateThreshold=none, maxIterations=none)](https://networkit.github.io/dev-docs/python_api/community.html?highlight=plp#networkit.community.PLP). The constructor expects a [networkit.Graph](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=graph#networkit.Graph) as a mandatory parameter. The parameter `updateThreshold` dictates the number of nodes that have to be changed in each iteration so that a new iteration starts, and `maxIterations` is the maximum number of iterations. `none` is NetworKit constant set to the maximum value of a 64-bit integer.

In [None]:
# Read graph
G = nk.readGraph("../input/jazz.graph", nk.Format.METIS)

In [None]:
# Choose and initialize algorithm 
plpCommunities = nk.community.detectCommunities(G, algo=nk.community.PLP(G))

In [None]:
print("{0} elements assigned to {1} subsets".format(plpCommunities.numberOfElements(),
                                                    plpCommunities.numberOfSubsets()))

In [None]:
print("the biggest subset has size {0}".format(max(plpCommunities.subsetSizes())))

In [None]:
nk.community.writeCommunities(plpCommunities, "output/communtiesPLP.partition")