Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't Finished running on million nodes and one hundred million edges data #162

Closed
NYcleaner opened this issue Dec 19, 2023 · 1 comment
Closed

Comments

@NYcleaner
Copy link

NYcleaner commented Dec 19, 2023

First , thanks for provide such a good graph calculation library。
Now I face a problem. When I use data of tens of thousands of nodes and millions of edges, Leiden's algorithm can complete community division in a few minutes on spark cluster (--spark.session.driverMemory=10g --spark.session.driverCores=1 --spark.session.executorCores=8 --spark.session.executorMemory=8G).
But when the data scale reaches one million nodes and 100 million edges, the algorithm will take 2 hours or more time.
So, for a large amount of data, is there any optimization method that can guide?
Looking forward to reply, thank you

@vtraag
Copy link
Owner

vtraag commented Dec 19, 2023

Thanks! Indeed, larger graphs may obviously take more time. The Leiden algorithm is one of the fastest algorithms available, so it won't be easy to find alternative. However, you might be interested in using the implementation in igraph itself: https://python.igraph.org/en/stable/api/igraph.Graph.html#community_leiden. That has more limited capabilities, but should be faster.

@vtraag vtraag closed this as completed Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants