# LazyFox Workflow

This is a guide on how to use our LazyFox project. It downloads the [Eu-Core dataset from the SNAP Group](https://snap.stanford.edu/data/email-Eu-core.html) and runs the LazyFox algorithm on it.

Further analysis can be made on the resulting output using the `Analysis.ipynb` notebook.

Note that this notebook is only examplary and not fit to handle larger datasets. Use the code here or refer to the `README.md` to run LazyFox from the commandline.

## Setup

Fetch the latest release of LazyFox or compile it yourself and specify the path to the binary below!

You can also change the directories used, they will be created if not already present.

In [None]:
lazy_fox_binary = "../LazyFox"

dataset_directory = "./datasets"
output_directory = "./output"

In [None]:
# Make the binary executable
!chmod +x {lazy_fox_binary}

In [None]:
import os
import os.path
import urllib
import gzip
import shutil
import uuid

from Datasets import download


# Setup directories
os.makedirs(dataset_directory, exist_ok=True)

# Download the Eu-Core dataset
download("eu", dataset_directory)

## Run
The following cells build the command to run LazyFox.

If the linux time utility exists (`/usr/bin/time`) the command is also benchmarked!

In [None]:
# Dataset input
eu_txt_path = os.path.join(dataset_directory, "email-Eu-core.txt")

# Create a unique run directory
run_output_directory = os.path.join(output_directory, uuid.uuid1().hex)
os.makedirs(run_output_directory, exist_ok=True)

queue_size = 1
thread_count = queue_size # highest parallelization degree possible is thread_count equal to queue_size
dumping = True # Whether computation results should be dumped to disk [0|1]

log_file = os.path.join(run_output_directory, "log")

command = f"{lazy_fox_binary} --input-graph {eu_txt_path} --output-dir {run_output_directory} " \
            f"--queue-size {queue_size} --thread-count {thread_count}"
if not dumping:
    command += " --disable-dumping "
print("The raw command:")
print(command)

In [None]:
# Capture the stdout and stderr into a log file
command = command + f" 2>&1 > {log_file}"
print("The command with log file capture:")
print(command)

In [None]:
# Add time benchmark to the command
if os.path.exists("/usr/bin/time"):
    benchmark_file = os.path.join(run_output_directory, "bench.mark")
    benchmark_prefix = f"/usr/bin/time -v -o {benchmark_file}"

    command = f"{benchmark_prefix} {command}"
    
    print("Command with time benchmark:")
    print(command)
else:
    print("'/usr/bin/time' not found, running without benchmark!")

In [None]:
# Run the benchmark command in bash shell
!{command}

In [None]:
# Display the log
!cat {log_file}