Skip to content

Reduce find_indirect_clusters() complexity from O(exp(n)) to O(n) #2071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mludvig
Copy link

@mludvig mludvig commented Jun 10, 2025

I was trying to generate a testset from KnowledgeGraph(nodes: 1861, relationships: 912103) and it would go on for two days in a row without any progress. So I did some profiling to see what's going on on a smaller KG and found that most of the time was being spent in find_indirect_clusters().

Here is an optimised implementation that seem to produce similar results but way faster. On my reduced size KG I'm down from 5 min to a fraction of a second and on the full KG I'm down from never finished to a few seconds.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 10, 2025
@Lukevanl
Copy link

Ran into the exact same issue and indeed made a similar fix, would be nice to get it merged since this limits the current code only to very small graphs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants