Can't Finished running on million nodes and one hundred million edges data #162

NYcleaner · 2023-12-19T11:40:27Z

First ， thanks for provide such a good graph calculation library。
Now I face a problem. When I use data of tens of thousands of nodes and millions of edges, Leiden's algorithm can complete community division in a few minutes on spark cluster (--spark.session.driverMemory=10g --spark.session.driverCores=1 --spark.session.executorCores=8 --spark.session.executorMemory=8G).
But when the data scale reaches one million nodes and 100 million edges, the algorithm will take 2 hours or more time.
So, for a large amount of data, is there any optimization method that can guide?
Looking forward to reply, thank you

vtraag · 2023-12-19T14:46:35Z

Thanks! Indeed, larger graphs may obviously take more time. The Leiden algorithm is one of the fastest algorithms available, so it won't be easy to find alternative. However, you might be interested in using the implementation in igraph itself: https://python.igraph.org/en/stable/api/igraph.Graph.html#community_leiden. That has more limited capabilities, but should be faster.

vtraag closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't Finished running on million nodes and one hundred million edges data #162

Can't Finished running on million nodes and one hundred million edges data #162

NYcleaner commented Dec 19, 2023 •

edited by vtraag

Loading

vtraag commented Dec 19, 2023

Can't Finished running on million nodes and one hundred million edges data #162

Can't Finished running on million nodes and one hundred million edges data #162

Comments

NYcleaner commented Dec 19, 2023 • edited by vtraag Loading

vtraag commented Dec 19, 2023

NYcleaner commented Dec 19, 2023 •

edited by vtraag

Loading