Article is available at: https://arxiv.org/abs/2307.04890. The follwing code has also been archived at: https://hal.inria.fr/ .
The repository provides different functions to compute the component matrix, the size of the out-components of the nodes and their hashed versions. The code provided can be used to compute:
- The component matrix
- with a full numpy array
- with a sparse scipy matrix
- with HyperLogLog structures
- The out-components sizes
- with the full numpy array
- with a sparse scipy matrix
- from the HyperLogLog structures
- The hashed component matrices
- The aggregation of the hashed matrices to approximate the true component matrix
The code is written in Python3, and uses the following libraries: NumPy, SciPy, Networkx, Datasketch and POT.
Run pip install -r requirements.txt
to install all the dependencies.
The file 'main.py' includes the code to compute all the aforementioned functions. You can use it with
$ python3 main.py