Risk-Averse Matchings over Uncertain Graph Databases
Charalampos E. Tsourakakis, Shreyas Sekar, Johnson Lam, Liu Yang
Contains an installable module, risk-averse-matching, for finding a bounded-variance or bounded-standard deviation matching on a uncertain graph or hypergraph. Also contains module for generating synthetic graphs with the following models and attributes:
- Erdos-Renyi, Barabasi-Albert graph models
- Bernoulli (weight and probability parameters), Gaussian (mean and variance parameters) distributed edges
- Uniform, Gaussian distributions to sample respective parameters w.r.t. to the edge distribution
Experiments were done on the following datasets: DBLP citation hypergraph, PPI graph, and synthetically generated graphs. These scripts have also been provided.
Setup
In the risk-averse-matching/
directory, create and start a virtualenv for the project
>>> virtualenv venv
>>> source venv/bin/activate
>>> pip install -e .
To use a virtualenv within jupyter notebook. Run the following command and switch to the "venv" kernel in the notebook.
>>> python -m ipykernel install --user --name=venv
Example: Generating Synthetic Graph
To generate an Erdos-Renyi graph, Bernoulli distributed edges, Uniformly sampled weights, and Gaussian sampled probabilities
from risk_averse_matching import hypergraph_matchings as hm
from risk_averse_matching import graph_generator as gg
graph, edge, weight, prob = 'erdos', 'bernoulli', 'uniform', 'gaussian'
g_param = { 'vertices': 6000, 'p': 0.005 }
w_param = { 'min': 0, 'max': 1000, 'discrete': True }
p_param = { 'mu': 0.5, 'sigma': 0.5/3, 'discrete': False, 'min': 0 }
edge_list = gg.gen_graph_attrib(graph, g_param, edge, weight, w_param, prob, prob_param)
Example: Finding Bounded-Variance Matchings
Using the maximum matching's variance, find bounded-variance matchings for 20 evenly split intervals
intervals = 20
variance_beta = True
g = hm.Hypergraph(edge_list, variance_beta, weight='weight', probability='probability', edge='edge', edge_distribution='bernoulli')
beta_thresholds = g.gen_betas(intervals)
for idx, beta in enumerate(beta_thresholds):
bv_matching, bv_stat = g.bounded_matching(beta)
# save or use returned matching and matching's stats