# LazyFox Workflow

This is a guide on how to use our LazyFox project. It downloads the [Eu-Core dataset from the SNAP Group](https://snap.stanford.edu/data/email-Eu-core.html) and runs the LazyFox algorithm on it.

Further analysis can be made on the resulting output using the `Analysis.ipynb` notebook.

Note that this notebook is only examplary and not fit to handle larger datasets. Use the code here or refer to the `README.md` to run LazyFox from the commandline.

## Setup

Fetch the latest release of LazyFox or compile it yourself and specify the path to the binary below!

You can also change the directories used, they will be created if not already present.

In [1]:
lazy_fox_binary = "../LazyFox"

dataset_directory = "./datasets"
output_directory = "./output"

In [2]:
# Make the binary executable
!chmod +x {lazy_fox_binary}

In [3]:
import os
import os.path
import urllib
import gzip
import shutil
import uuid

from Datasets import download


# Setup directories
os.makedirs(dataset_directory, exist_ok=True)

# Download the Eu-Core dataset
download("eu", dataset_directory)

## Run
The following cells build the command to run LazyFox.

If the linux time utility exists (`/usr/bin/time`) the command is also benchmarked!

In [4]:
# Dataset input
eu_txt_path = os.path.join(dataset_directory, "email-Eu-core.txt")

# Create a unique run directory
run_output_directory = os.path.join(output_directory, uuid.uuid1().hex)
os.makedirs(run_output_directory, exist_ok=True)

queue_size = 1
thread_count = queue_size # highest parallelization degree possible is thread_count equal to queue_size
dumping = 1 # Whether computation results should be dumped to disk [0|1]

log_file = os.path.join(run_output_directory, "log")

command = f"{lazy_fox_binary} {eu_txt_path} {run_output_directory} {queue_size} {thread_count} {dumping}"
print("The raw command:")
print(command)

The raw command:
../LazyFox ./datasets/email-Eu-core.txt ./output/ec9ad1c814db11ec920c8900ab60aac0 1 1 1


In [5]:
# Capture the stdout and stderr into a log file
command = command + f" 2>&1 > {log_file}"
print("The command with log file capture:")
print(command)

The command with log file capture:
../LazyFox ./datasets/email-Eu-core.txt ./output/ec9ad1c814db11ec920c8900ab60aac0 1 1 1 2>&1 > ./output/ec9ad1c814db11ec920c8900ab60aac0/log


In [6]:
# Add time benchmark to the command
if os.path.exists("/usr/bin/time"):
    benchmark_file = os.path.join(run_output_directory, "bench.mark")
    benchmark_prefix = f"/usr/bin/time -v -o {benchmark_file}"

    command = f"{benchmark_prefix} {command}"
    
    print("Command with time benchmark:")
    print(command)
else:
    print("'/usr/bin/time' not found, running without benchmark!")

Command with time benchmark:
/usr/bin/time -v -o ./output/ec9ad1c814db11ec920c8900ab60aac0/bench.mark ../LazyFox ./datasets/email-Eu-core.txt ./output/ec9ad1c814db11ec920c8900ab60aac0 1 1 1 2>&1 > ./output/ec9ad1c814db11ec920c8900ab60aac0/log


In [7]:
# Run the benchmark command in bash shell
!{command}

In [8]:
# Display the log
!cat {log_file}

Starting Fox
running fox with input ./datasets/email-Eu-core.txt and output ./output/ec9ad1c814db11ec920c8900ab60aac0processing a queue size of 1 and working with 1 threads
dumping is enabled
loading input file
finding max node id
building the adj list
building vectors from sets
graph loaded
running with 1 threads
beginning initialisation
counting neighbors
calculating CC Per node
1 % have a cc 	2 % have a cc 	3 % have a cc 	4 % have a cc 	5 % have a cc 	6 % have a cc 	7 % have a cc 	8 % have a cc 	9 % have a cc 	10 % have a cc 	11 % have a cc 	12 % have a cc 	13 % have a cc 	14 % have a cc 	15 % have a cc 	16 % have a cc 	17 % have a cc 	18 % have a cc 	19 % have a cc 	20 % have a cc 	21 % have a cc 	22 % have a cc 	23 % have a cc 	24 % have a cc 	25 % have a cc 	26 % have a cc 	27 % have a cc 	28 % have a cc 	29 % have a cc 	30 % have a cc 	31 % have a cc 	32 % have a cc 	33 % have a cc 	34 % have a cc 	35 % have a cc 	36 % have a cc 	3