## Lazy Fox Ground Truth Metrics
All datasets we worked with provide ground truth community data and we can therefore compute scores such as F1 and NMI.

However, the provided ground truths are not directly usable for comparision as a) they are not provided in a format readable by networkit and b) LazyFox' output is neither.

We need to rewrite the ground truth and the LazyFox cluster output.

Note also that any comparision to ground truth in this field will yield very low scores, as there is no definition of the term "community". See our project report for more details.

## Config

In [None]:
dataset_directory = "./datasets"
# Where the converted datasets and ground truths are saved
output_directory = dataset_directory

# What dataset is used
dataset = "eu"
# What LazyFox run to compare
run_directory = "./example_data/run_eu_with_1"

## Setup
This section downloads and rewrites the specified dataset, if not already present.

It also rewrites the final cluster result of the specified run directory.

In [None]:
from Datasets import download
from Rewriter import Rewriter
from BenchmarkRun import BenchmarkRun

download(dataset, dataset_directory)

rewriter = Rewriter(dataset_directory, output_directory)
rewriter.rewrite_dataset(dataset)

run = BenchmarkRun(dataset, run_directory)
rewriter.rewrite_lazyfox_result(run)

In [None]:
def get_ground_truth(dataset):
    return join(output_directory, f"rewritten_{dataset}_gt.txt")

def get_graph(dataset):
    return join(output_directory, f"rewritten_{dataset}_graph.txt")

In [None]:
from networkit.community import CoverF1Similarity, OverlappingNMIDistance

def calc_f1(graph, ground_truth, lazy_fox_result):
    # F1
    f1 = CoverF1Similarity(graph, ground_truth, lazy_fox_result)
    f1.run()
    return f1.getWeightedAverage()

def calc_nmi(graph, ground_truth, lazy_fox_result):
    nmi = OverlappingNMIDistance()
    distance = nmi.getDissimilarity(graph, ground_truth, lazy_fox_result)
    return distance

## Ground Truth Analysis
Finally, we can compute the F1 and the NMI score of the specified run.

Note that this computation can take very long, dependent on graph scale.

In [None]:
"""Runs analysis for one specific run for one specific iteration (default: final iteration)"""

from networkit.graphio import CoverReader
import networkit as nk
from os.path import join
from pandas import DataFrame

print(dataset)
print("Loading Graph")
graph = nk.readGraph(get_graph(dataset), nk.Format.SNAP)
cover_reader = CoverReader()
print("Loading GT")
ground_truth = cover_reader.read(get_ground_truth(dataset), graph)

iteration = len(run.iterations) - 1

lazy_fox_result_path = join(run.run_directory, "iterations", f"rewritten_{iteration}clusters.txt")
lazy_fox_result = cover_reader.read(lazy_fox_result_path, graph)

print("Computing F1")
f1 = calc_f1(graph, ground_truth, lazy_fox_result)
print(f"wei. avg: {f1}")
print("Computing NMI")
nmi_dis = calc_nmi(graph, ground_truth, lazy_fox_result)
print(f"nmi: {nmi_dis}")

df = DataFrame([[dataset, f1, nmi_dis]], columns=["Dataset", "F1", "NMI Distance"])
df