calculate graph similarity by using GraKel #59

tiedaoxiaotubie · 2021-02-23T14:36:56Z

Hi, I wonder if we can use GraKel to calculate graph similarity only? I don't need to classify the graphs, I only want to get the similarity among them, I have generated graphs with networkX, since multiple strategies of graph kernel have been implemented in GraKel, I think it should have the ability to export the graph similarity, but sadly I failed to find such an API.

giannisnik · 2021-02-23T15:25:46Z

Hi @tiedaoxiaotubie ,

Assuming that the nodes of your graphs are not annotated with attributes, you can transform them into grakel graphs as follows:

from grakel.utils import graph_from_networkx
G = graph_from_networkx(G_nx)

where G_nx is a list of networkx graphs.

Then, you can compute the n x n kernel matrix (where n is the number of graphs). You can think of the normalized kernel values as the similarities between graphs. Thus, the (i,j)-th element of the kernel matrix corresponds to the similarity between the i-th graph and the j-th graph. To produce the kernel matrix you can use one of the kernels that is implemented in grakel. For instance, you can compute the shortest path kernel as follows:

from grakel.kernels import ShortestPath
gk = ShortestPath(normalize=True)
K = gk.fit_transform(G)

Then, the (i,j)-th element of K contains the similarity between graphs i and j.

tiedaoxiaotubie · 2021-02-23T15:58:57Z

Thank you very much for your reply! the nodes of my graphs are neither annotated with attributes nor with labels, I tried with your code, but seems not work for me. I have to admit that my graph is pretty large, in which the edge number is more than 6,000, maybe the GraKel cannot handle such a big graph in one time?

I tried to use ShortestPath first, it raised an error immediately:

(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
Traceback (most recent call last):
  File "CFGenerator.py", line 68, in <module>
    main()
  File "CFGenerator.py", line 62, in main
    C._cfg_similarity()
  File "CFGenerator.py", line 44, in _cfg_similarity
    K = gk.fit_transform(G)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 197, in fit_transform
    km = self._calculate_kernel_matrix()
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 231, in _calculate_kernel_matrix
    K[i, i] = self.pairwise_operation(x, x)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/random_walk.py", line 206, in pairwise_operation
    XY = np.kron(X, Y)
  File "<__array_function__ internals>", line 6, in kron
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 1154, in kron
    result = outer(a, b).reshape(as_+bs)
  File "<__array_function__ internals>", line 6, in outer
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/core/numeric.py", line 942, in outer
    return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
numpy.core._exceptions.MemoryError: Unable to allocate 8.83 PiB for an array with shape (35259844, 35259844) and data type float64

I searched the previous issue that similar to my question, I noticed there were people tried to use RandomWalk to calculate the similarity:#8, so I also gave it a try, my code are as follows:

    def _cfg_similarity(self):
        # Transforms list of NetworkX graphs into a list of GraKeL graphs
        G = graph_from_networkx(self._cfg_all)

        # Uses the shortest path kernel to generate the kernel matrices
        gk = RandomWalk(method_type="baseline", lamda=0.01)
        K = gk.fit_transform(G)
        print(K)

at this time, my error message was:

(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
Traceback (most recent call last):
  File "CFGenerator.py", line 68, in <module>
    main()
  File "CFGenerator.py", line 62, in main
    C._cfg_similarity()
  File "CFGenerator.py", line 44, in _cfg_similarity
    K = gk.fit_transform(G)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 197, in fit_transform
    km = self._calculate_kernel_matrix()
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 231, in _calculate_kernel_matrix
    K[i, i] = self.pairwise_operation(x, x)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/random_walk.py", line 206, in pairwise_operation
    XY = np.kron(X, Y)
  File "<__array_function__ internals>", line 6, in kron
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 1154, in kron
    result = outer(a, b).reshape(as_+bs)
  File "<__array_function__ internals>", line 6, in outer
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/numpy/core/numeric.py", line 942, in outer
    return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis, :], out)
numpy.core._exceptions.MemoryError: Unable to allocate 8.83 PiB for an array with shape (35259844, 35259844) and data type float64

giannisnik · 2021-02-23T16:17:53Z

It seems that not only the graphs are large, but also the dataset itself (35,259,844 graphs in total?). Comparing every pair of graphs is not feasible since it requires generating a 35,259,844 x 35,259,844 matrix which does not fit in memory. Do you need to compare each one of the 35,259,844 graphs against every other of the dataset?

tiedaoxiaotubie · 2021-02-23T16:53:54Z

OK, I changed the number of graphs to be calculated, only select 4 of the original graphs, but there are still more than 6,000 edges and 5,000 nodes for each of the graphs, at this time, the RandomWalk worked, while the ShortestPath still failed. The error message of ShortestPath are as follows:

(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
Traceback (most recent call last):
  File "CFGenerator.py", line 68, in <module>
    main()
  File "CFGenerator.py", line 62, in main
    C._cfg_similarity()
  File "CFGenerator.py", line 44, in _cfg_similarity
    K = gk.fit_transform(G)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/shortest_path.py", line 393, in fit_transform
    self.fit(X)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py", line 123, in fit
    self.X = self.parse_input(X)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/shortest_path.py", line 460, in parse_input
    labels=self._lt)
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/graph.py", line 688, in build_shortest_path_matrix
    return shortest_path_mat, self.get_labels()
  File "/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/graph.py", line 737, in get_labels
    raise ValueError('Graph does not have any labels for vertices.')
ValueError: Graph does not have any labels for vertices.

It seems trying to get labels from the graph, but the graphs of mine are neither annotated with attributes nor with labels, which makes me confused.

The output of RandomWalk are as follows:

(base) symexe@ubuntu1804:~/CFGFactory$ python3 CFGenerator.py
/home/symexe/miniconda3/lib/python3.7/site-packages/grakel/kernels/kernel.py:201: RuntimeWarning: invalid value encountered in sqrt
  return km / np.sqrt(np.outer(self._X_diag, self._X_diag))
[[-1.                 nan         nan         nan]
 [        nan  1.          1.07683489  0.89696687]
 [        nan  1.07683489  1.          1.09279374]
 [        nan  0.89696687  1.09279374  1.        ]]

For the output of RandomWalk, I have 2 questions:

why there are -1 and nan?
I noticed the GPU was not used when during the script's execution, is it normal? I thought it should use GPU at least :)

Hope I could get your answer, thank you very much :)

giannisnik · 2021-02-23T17:21:00Z

With regards to the two questions:

I think that this is related to the lamda parameter of the kernel. Probably, you need to set this to some smaller value than the default. To obtain values between 0 and 1, set also normalize=True.
The code does not use the GPU. The kernels run on the CPU.

With regards to the shortest path kernel, it expects node-labeled graphs. You can set the with_labels attribute to False as follows:
gk = ShortestPath(with_labels=False, normalize=True)

Perhaps the most efficient kernel is the WL subtree. But this kernel expects the graphs to contain node labels. To run the kernel, you can assign the same label to all nodes of all graphs (for instance, the label 'a'). To do this, have a look at this example: https://ysig.github.io/GraKeL/0.1a8/auto_examples/nx_to_grakel.html#sphx-glr-auto-examples-nx-to-grakel-py

tiedaoxiaotubie · 2021-02-23T17:48:05Z

Perfect answer, awesome! I got better result after using a smaller lamda in RandomWalk. the error of shortest path kernel also was handled after adding with_labels=False.

tiedaoxiaotubie · 2022-08-25T03:38:02Z

您的邮件我已经收到了，我会尽快处理然后给你回复。谢谢！

tiedaoxiaotubie closed this as completed Feb 23, 2021

YuWVandy mentioned this issue Nov 6, 2022

Overall_reweight and Batch_reweight are not running!!! YuWVandy/G2GNN#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calculate graph similarity by using GraKel #59

calculate graph similarity by using GraKel #59

tiedaoxiaotubie commented Feb 23, 2021

giannisnik commented Feb 23, 2021

tiedaoxiaotubie commented Feb 23, 2021

giannisnik commented Feb 23, 2021 •

edited

Loading

tiedaoxiaotubie commented Feb 23, 2021

giannisnik commented Feb 23, 2021

tiedaoxiaotubie commented Feb 23, 2021

tiedaoxiaotubie commented Aug 25, 2022 via email

calculate graph similarity by using GraKel #59

calculate graph similarity by using GraKel #59

Comments

tiedaoxiaotubie commented Feb 23, 2021

giannisnik commented Feb 23, 2021

tiedaoxiaotubie commented Feb 23, 2021

giannisnik commented Feb 23, 2021 • edited Loading

tiedaoxiaotubie commented Feb 23, 2021

giannisnik commented Feb 23, 2021

tiedaoxiaotubie commented Feb 23, 2021

tiedaoxiaotubie commented Aug 25, 2022 via email

giannisnik commented Feb 23, 2021 •

edited

Loading