**GraphQL**

This is a rule based subgraph matching algorithm, mainly focusing on the reduction of search space part. To be specific, it proposes a great solution to reduce the search space for **candidate set**.

In the initial generation of candidate set, graphQL implements the simplest way, that is, adding all the nodes in graph data with **same label** to nodes in query data as candidate set.

And then, the authors try to use the **neighborhood subgraph profiles** of the nodes in query graph to reduce the search space. Basically, they just check whether all neighbors of a node in the query graph have candidate in all the neighbors of its candidate node in graph data, which is called **semi-perfect matching**. If yes, then this could be a valid mapping. Otherwise, we should remove the corresponding node pair, because although they have the same label, their neighbors could not match with each other. In this case, we would remove this this node in graph data from the candidate set of the node in query graph. As we remove possible candidate, this can **influence** whether the neighbors could form a semi-perfect matching. Thus, the more **iteration** we do, the more refine our candidate set would be(if possible). This iteration would be considered as **refinement level**.

To **speed up** the finding of semi-perfect matching, the authors use a small technique. If two nodes are consider to be semi-perfect matching, they will **not check them again in the next level of refinement**. Otherwise, we should first remove this candidate from the node's candidate set and then **mark the neighbors corresponding pair as unchecked**, because when we remove a candidate, it may also influence whether the neighbors could form a semi-perfect matching.

In the field of subgraph matching, the baseline of matching order generation is that we iteratively add the node with **smallest size** of candidate set until we run out of nodes(The relatively smaller candidate set usually represents **larger degree** and **more sure** about the candidate chosen).

Finally, they use the baseline of enumeration procedure, that is, checking the each node step by step and while checking the current node, avoiding the node that has been matched in the sequence before it.

In [None]:
!git clone https://github.com/RapidsAtHKUST/SubgraphMatching.git

Cloning into 'SubgraphMatching'...
remote: Enumerating objects: 278, done.[K
remote: Counting objects: 100% (278/278), done.[K
remote: Compressing objects: 100% (270/270), done.[K
remote: Total 278 (delta 9), reused 260 (delta 3), pack-reused 0[K
Receiving objects: 100% (278/278), 2.51 MiB | 12.54 MiB/s, done.
Resolving deltas: 100% (9/9), done.


In [None]:
from collections import defaultdict

In [None]:
class graph():
  def __init__(self, graphid, node2label, node2degree, edges):
    self.graphid = graphid
    self.node2label = node2label
    self.node2degree = node2degree
    self.edges = edges
    self.candidateset = defaultdict(set)
    self.label2node = defaultdict(set)
    for node in self.node2label:
      self.label2node[self.node2label[node]].add(node)
    self.phi = []

  def reset(self):
    self.candidateset = defaultdict(set)
    self.phi = []

In [None]:
def get_graph(filepath, filename):
  global qcount
  global gcount

  node2label = {}
  node2degree = {}
  edges = defaultdict(set)
  f = open(filepath, "r", encoding="utf-8")

  _, nodenum, edgenum = f.readline().strip().split()
  for i in range(int(nodenum)):
    _, nodeid, nodelabel, nodedegree = f.readline().strip().split()
    node2label[int(nodeid)] = int(nodelabel)
    node2degree[int(nodeid)] = int(nodedegree)  
  for i in range(int(edgenum)):
    _, node1, node2 = f.readline().strip().split()
    edges[int(node1)].add(int(node2))
    edges[int(node2)].add(int(node1))

  f.close()
  g = graph(filename, node2label, node2degree, edges)

  return g

In [None]:
qcount = 0
gcount = 0

import os
qs = []
qdir = "SubgraphMatching/test/query_graph"
for f in os.listdir(qdir):
  filepath = os.path.join(qdir, f)
  qs.append(get_graph(filepath, f))

gs = []
gdir = "SubgraphMatching/test/data_graph"
for f in os.listdir(gdir):
  filepath = os.path.join(gdir, f)
  gs.append(get_graph(filepath, f))

print(len(qs))
print(len(gs))

f = open("SubgraphMatching/test/expected_output.res", "r", encoding="utf-8")
lines = f.readlines()
f.close()

expects = {}
for line in lines:
  name, times = line.strip().split(":")
  expects[name + ".graph"] = int(times)
print(len(expects))

200
1
200


In [None]:
def graphQL_CSG(q, g, level):
  marks = set()
  for qnode in q.node2label:
    label = q.node2label[qnode]
    q.candidateset[qnode] |= g.label2node[label]
    for gnode in q.candidateset[qnode]:
      marks.add((qnode, gnode))
  
  for i in range(level):
    for qnode in q.node2label:
      for gnode in q.candidateset[qnode].copy():
        mark = (qnode, gnode)
        if mark not in marks:
          continue
        
        b = set()
        qneighbors = q.edges[qnode]
        gneighbors = g.edges[gnode]
        for qneighbor in qneighbors:
          for gneighbor in gneighbors:
            if gneighbor in q.candidateset[qneighbor]:
              b.add(qneighbor)
              break
        
        semiperfect = qneighbors - b
        if len(semiperfect) == 0:
          marks.remove(mark)
        else:
          q.candidateset[qnode].remove(gnode)
          for qneighbor in qneighbors:
            for gneighbor in gneighbors:
              if gneighbor in q.candidateset[qneighbor]:
                mark = (qneighbor, gneighbor)
                marks.add(mark)

    if len(marks) == 0:
      break

In [None]:
def graphQL_MOG(q):
  argminc = {}
  for node in q.candidateset:
    argminc[node] = len(q.candidateset[node])
  
  cur = min(argminc, key=argminc.get)
  visited = set()
  while (1):
    q.phi.append(cur)
    visited.add(cur)
    if len(q.phi) == len(q.node2label):
      break
    tmp = {}
    for node in q.phi:
      for neighbor in q.edges[node]:
        if neighbor not in visited:
          tmp[neighbor] = len(q.candidateset[neighbor])
    cur = min(tmp, key=tmp.get)

In [None]:
def graphQL_EP(q, g, m, i, totalresult): # not equal to the original code
  if i == len(q.phi) + 1:
    totalresult.append(m.copy())
    return 
  result = {}
  u = -1
  for node in q.phi:
    if node not in m:
      u = node
      break

  lc = set()

  if i == 1:
    lc = q.candidateset[u]
  else:
    for v in q.candidateset[u]:
      flag = True
      for node in q.phi:
        if node == u:
          break
        if v == m[node] or (m[node] not in g.edges[v] and (node in q.edges[u] or u in q.edges[node])):
          flag = False
          break
      if flag:
        lc.add(v)

  for node in lc:
    if node not in set(m.values()):
      m[u] = node
      graphQL_EP(q, g, m, i + 1, totalresult)
      m.pop(u)

In [None]:
queries = {}
for g in gs:
  for q in qs:
    q.reset()
    graphQL_CSG(q, g, 10)
    graphQL_MOG(q)
    totalresult = []
    graphQL_EP(q, g, {}, 1, totalresult)
    queries[q.graphid] = len(totalresult)

print(queries)

{'query_dense_16_170.graph': 18, 'query_dense_16_50.graph': 88, 'query_dense_16_72.graph': 24, 'query_dense_16_152.graph': 432, 'query_dense_16_86.graph': 1, 'query_dense_16_151.graph': 138, 'query_dense_16_95.graph': 354, 'query_dense_16_76.graph': 41, 'query_dense_16_168.graph': 75, 'query_dense_16_9.graph': 42, 'query_dense_16_181.graph': 8, 'query_dense_16_94.graph': 2, 'query_dense_16_88.graph': 12, 'query_dense_16_154.graph': 12, 'query_dense_16_37.graph': 1, 'query_dense_16_180.graph': 1, 'query_dense_16_164.graph': 480, 'query_dense_16_113.graph': 6, 'query_dense_16_188.graph': 8, 'query_dense_16_104.graph': 38, 'query_dense_16_42.graph': 32, 'query_dense_16_124.graph': 2, 'query_dense_16_174.graph': 6, 'query_dense_16_71.graph': 9, 'query_dense_16_87.graph': 12, 'query_dense_16_39.graph': 1, 'query_dense_16_65.graph': 2, 'query_dense_16_11.graph': 288, 'query_dense_16_14.graph': 2, 'query_dense_16_145.graph': 1, 'query_dense_16_30.graph': 2, 'query_dense_16_58.graph': 3, 'quer

In [None]:
flag = True
for name in expects:
  if expects[name] != queries[name]:
    print(name)
    flag = False
if flag:
  print("correct")

correct
