**CECI**

This algorithm is a bit like CFL. It use a structure called CECI to store all the tree edges and non tree edges.(The psudo code of this paper is now well written, at least for me. I have to read the source code to fully understand how the code runs)

Also, this algorithm follow the classic procedure of subgraph matching. It mainly focuses on the candidate set generation part. For the last two part, it only use the basic algorithm, but with the CECI this algorithm generates.

Like CFL, it first generate a BFS tree to divide tree edges and non tree edges. The difference between them is that **CFL doesn't use non tree edges while CECI store non tree edges** to prune search space as well.

They initial candidate set for each node in the query graph like many other algorithm in this field, ie, **NLF, LF and DF**.

Based on this, they first generate the **tree edge candidate set** to prune the total candidate set and reduce the search space.

For the current node u in query graph, we should check the parent to see **if the neighbors of the parent's candidates could pass** all the filter. If yes, then these nodes may be a valid candidate for u. Otherwise, we should **delete the tree edges for all the children** of its parent.

Then we generate the **non tree edge candidate set**. The process is almost the same as the tree edge part.

Finally, for tree edges and non tree edges, we use **cardinalities**, a variable vector defined by the authors, to prune the search space. If a node has **multiple edges from nodes before it in the BFS tree order**, the candidates of it should be in both **tree edge and non tree edge candidate set** to be considered. The authors use this feature to prune the candidate set.

This is a hard to reproduce paper if we don't see the source code.

In [None]:
!git clone https://github.com/RapidsAtHKUST/SubgraphMatching.git

Cloning into 'SubgraphMatching'...
remote: Enumerating objects: 278, done.[K
remote: Counting objects: 100% (278/278), done.[K
remote: Compressing objects: 100% (270/270), done.[K
remote: Total 278 (delta 9), reused 260 (delta 3), pack-reused 0[K
Receiving objects: 100% (278/278), 2.51 MiB | 12.30 MiB/s, done.
Resolving deltas: 100% (9/9), done.


In [None]:
from collections import defaultdict

In [None]:
class graph():
  def __init__(self, graphid, node2label, node2degree, edges):
    self.graphid = graphid
    self.node2label = node2label
    self.node2degree = node2degree
    self.edges = edges
    self.candidateset = defaultdict(set)
    self.label2node = defaultdict(set)
    for node in self.node2label:
      self.label2node[self.node2label[node]].add(node)
    self.phi = []
    self.phiparent = {}
    self.te = defaultdict(lambda: defaultdict(set))
    self.nte = defaultdict(lambda: defaultdict(set))

  def reset(self):
    self.candidateset = defaultdict(set)
    self.phi = []
    self.phiparent = {}
    self.te = defaultdict(lambda: defaultdict(set))
    self.nte = defaultdict(lambda: defaultdict(set))

In [None]:
def get_graph(filepath, filename):
  global qcount
  global gcount

  node2label = {}
  node2degree = {}
  edges = defaultdict(set)
  f = open(filepath, "r", encoding="utf-8")

  _, nodenum, edgenum = f.readline().strip().split()
  for i in range(int(nodenum)):
    _, nodeid, nodelabel, nodedegree = f.readline().strip().split()
    node2label[int(nodeid)] = int(nodelabel)
    node2degree[int(nodeid)] = int(nodedegree)  
  for i in range(int(edgenum)):
    _, node1, node2 = f.readline().strip().split()
    edges[int(node1)].add(int(node2))
    edges[int(node2)].add(int(node1))

  f.close()
  g = graph(filename, node2label, node2degree, edges)

  return g

In [None]:
qcount = 0
gcount = 0

import os
qs = []
qdir = "SubgraphMatching/test/query_graph"
for f in os.listdir(qdir):
  filepath = os.path.join(qdir, f)
  qs.append(get_graph(filepath, f))

gs = []
gdir = "SubgraphMatching/test/data_graph"
for f in os.listdir(gdir):
  filepath = os.path.join(gdir, f)
  gs.append(get_graph(filepath, f))

print(len(qs))
print(len(gs))

f = open("SubgraphMatching/test/expected_output.res", "r", encoding="utf-8")
lines = f.readlines()
f.close()

expects = {}
for line in lines:
  name, times = line.strip().split(":")
  expects[name + ".graph"] = int(times)
print(len(expects))

200
1
200


In [None]:
def CECI_CSG(q, g):
  for qnode in q.node2label:
    label = q.node2label[qnode]
    for gnode in g.label2node[label]:
      if q.node2degree[qnode] <= g.node2degree[gnode]:
        q.candidateset[qnode].add(gnode)

  qlabels = defaultdict(lambda: defaultdict(int))
  for qnode in q.node2label:
    qneighbors = q.edges[qnode]
    for qneighbor in qneighbors:
      qlabels[qnode][q.node2label[qneighbor]] += 1
  
  glabels = defaultdict(lambda: defaultdict(int))
  for gnode in g.node2label:
    gneighbors = g.edges[gnode]
    for gneighbor in gneighbors:
      glabels[gnode][g.node2label[gneighbor]] += 1
  
  for qnode in q.node2label:
    for gnode in q.candidateset[qnode].copy():
      for label in qlabels[qnode]:
        if qlabels[qnode][label] > glabels[gnode][label]:
          q.candidateset[qnode].remove(gnode)
          break

  for qnode in q.node2label:
    qneighbors = q.edges[qnode]
    for qneighbor in qneighbors:
      for nodecandidate in q.candidateset[qnode].copy():
        if len(g.edges[nodecandidate] & q.candidateset[qneighbor]) == 0:
          q.candidateset[qnode].remove(nodecandidate)
        

  scores = {}
  for qnode in q.node2label:
    scores[qnode] = len(q.candidateset[qnode]) / q.node2degree[qnode]
  
  r = min(scores, key=scores.get)
  

  visited = set()
  queue = []
  visited.add(r)
  queue.append(r)
  q.phiparent[r] = -1
  q.phi.append(r)
  while queue:
    top = queue[0]
    queue.pop(0)
    for neighbor in q.edges[top]:
      if neighbor not in visited:
        visited.add(neighbor)
        queue.append(neighbor)
        q.phiparent[neighbor] = top
        q.phi.append(neighbor)
  
  parent2children = defaultdict(set)
  for child in q.phiparent:
    parent2children[q.phiparent[child]].add(child)
  
  cardinalities = defaultdict(lambda: defaultdict(int))
  for qnode in q.node2label:
    for qcandidate in q.candidateset[qnode]:
      cardinalities[qnode][qcandidate] = 1


  visited = set()
  for i in range(len(q.phi)):
    u = q.phi[i]
    visited.add(u)
    up = q.phiparent[u]
    for vf in q.candidateset[up].copy():
      vfneighbors = g.edges[vf].copy()
      vfneighbors &= g.label2node[q.node2label[u]]
      for vfneighbor in vfneighbors:
        if g.node2degree[vfneighbor] < q.node2degree[u]:
          continue
        
        nlcf = True
        qlabels = defaultdict(int)
        qneighbors = q.edges[u]
        for qneighbor in qneighbors:
          qlabels[q.node2label[qneighbor]] += 1
        
        glabels = defaultdict(int)
        gneighbors = g.edges[vfneighbor]
        for gneighbor in gneighbors:
          glabels[g.node2label[gneighbor]] += 1
        
        for label in qlabels:
          if qlabels[label] > glabels[label]:
            nlcf = False
            break
        
        if not nlcf:
          continue
        
        q.candidateset[u].add(vfneighbor)
        q.te[u][vf].add(vfneighbor)
      if len(q.te[u][vf]) == 0:
        q.candidateset[up].remove(vf)
        for child in parent2children[up]:
          if child not in visited:
            continue
          if vf in q.te[child]:
            q.te[child].pop(vf)
        cardinalities[up][vf] = 0

  for i in range(len(q.phi)):
    u = q.phi[i]
    for j in range(0, i):
      up = q.phi[j]
      if up not in q.edges[u]:
        continue
      if up == q.phiparent[u]:
        continue
      for vf in q.candidateset[up].copy():
        vfneighbors = g.edges[vf].copy()
        vfneighbors &= g.label2node[q.node2label[u]]
        for vfneighbor in vfneighbors:
          if g.node2degree[vfneighbor] < q.node2degree[u]:
            continue
          
          nlcf = True
          qlabels = defaultdict(int)
          qneighbors = q.edges[u]
          for qneighbor in qneighbors:
            qlabels[q.node2label[qneighbor]] += 1
          
          glabels = defaultdict(int)
          gneighbors = g.edges[vfneighbor]
          for gneighbor in gneighbors:
            glabels[g.node2label[gneighbor]] += 1
          
          for label in qlabels:
            if qlabels[label] > glabels[label]:
              nlcf = False
              break
          
          if not nlcf:
            continue
          
          q.candidateset[u].add(vfneighbor)
          q.nte[u][vf].add(vfneighbor)
        if len(q.nte[u][vf]) == 0:
          cardinalities[up][vf] = 0
          q.candidateset[up].remove(vf)


  reverseorder = q.phi.copy()
  reverseorder.reverse()

  
  for u in reverseorder:
    up = q.phiparent[u]
    for vp in q.te[u]:
      score = 0
      for v in q.te[u][vp].copy():
        score += cardinalities[u][v]
        if cardinalities[u][v] == 0:
          if v in q.candidateset[u]:
            q.candidateset[u].remove(v)
          q.te[u][vp].remove(v)
      cardinalities[up][vp] *= score


  for u in q.nte:
    for vf in q.nte[u].copy():
      if len(q.nte[u][vf]) == 0:
        q.nte[u].pop(vf)
  
  for u in q.nte.copy():
    if len(q.nte[u]) == 0:
      q.nte.pop(u)

  for u in q.te:
    for vf in q.te[u].copy():
      if len(q.te[u][vf]) == 0:
        q.te[u].pop(vf)
  
  for u in q.te.copy():
    if len(q.te[u]) == 0:
      q.te.pop(u) 


In [None]:
def CECI_MOG(q):
  '''
  already generated by the BFS order
  '''
  return

In [None]:
def CECI_EP(q, g, m, i, totalresult): # not equal to the original code
  if i == len(q.phi) + 1:
    totalresult.append(m.copy())
    return 
  result = {}
  u = -1
  for node in q.phi:
    if node not in m:
      u = node
      break

  lc = set()

  if i == 1:
    lc = q.candidateset[u]
  elif q.phi.index(u) == 1:
    lc = q.te[u][m[q.phiparent[u]]] | q.nte[u][m[q.phiparent[u]]]
  else:
    tmp = set()
    for node in q.phi:
      if node == u:
        break
      if u not in q.edges[node]:
        continue
      if tmp:
        tmp &= q.te[u][m[node]] | q.nte[u][m[node]]
      else:
        tmp = q.te[u][m[node]] | q.nte[u][m[node]]

    for v in tmp:
      if q.node2label[u] == g.node2label[v] and g.node2degree[v] >= q.node2degree[u]:
        flag = True
        for node in q.phi:
          if node == u:
            break
          if node == q.phiparent[u]:
            continue
          #if m[node] not in g.edges[v]:
          if v == m[node] or (m[node] not in g.edges[v] and (node in q.edges[u] or u in q.edges[node])):
            flag = False
            break
        if flag:
          lc.add(v)


  for node in lc:
    if node not in set(m.values()):
      m[u] = node
      CECI_EP(q, g, m, i + 1, totalresult)
      m.pop(u)


In [None]:
queries = {}

for g in gs:
  for q in qs:
    q.reset()
    
    CECI_CSG(q, g)
    CECI_MOG(q)

    totalresult = []
    
    CECI_EP(q, g, {}, 1, totalresult)

    queries[q.graphid] = len(totalresult)

print(queries)

{'query_dense_16_25.graph': 4, 'query_dense_16_18.graph': 2, 'query_dense_16_2.graph': 80, 'query_dense_16_114.graph': 4, 'query_dense_16_81.graph': 124, 'query_dense_16_52.graph': 6, 'query_dense_16_163.graph': 8, 'query_dense_16_105.graph': 2, 'query_dense_16_93.graph': 22, 'query_dense_16_122.graph': 16, 'query_dense_16_66.graph': 3, 'query_dense_16_178.graph': 1, 'query_dense_16_162.graph': 48, 'query_dense_16_20.graph': 2, 'query_dense_16_86.graph': 1, 'query_dense_16_72.graph': 24, 'query_dense_16_131.graph': 3, 'query_dense_16_197.graph': 8, 'query_dense_16_147.graph': 1526, 'query_dense_16_151.graph': 138, 'query_dense_16_99.graph': 260, 'query_dense_16_60.graph': 10, 'query_dense_16_112.graph': 2, 'query_dense_16_111.graph': 2, 'query_dense_16_169.graph': 44, 'query_dense_16_5.graph': 4, 'query_dense_16_15.graph': 60, 'query_dense_16_107.graph': 21, 'query_dense_16_157.graph': 2, 'query_dense_16_193.graph': 2, 'query_dense_16_31.graph': 8, 'query_dense_16_139.graph': 12, 'quer

In [None]:
flag = True
for name in expects:
  if expects[name] != queries[name]:
    print(name)
    print(expects[name])
    print(queries[name])
    flag = False
if flag:
  print("correct")

correct
