Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.8.0 much slower than 0.7.0? #35

Closed
ShobiStassen opened this issue Jun 2, 2020 · 31 comments · Fixed by #36
Closed

0.8.0 much slower than 0.7.0? #35

ShobiStassen opened this issue Jun 2, 2020 · 31 comments · Fixed by #36

Comments

@ShobiStassen
Copy link

hello,
I noticed that when I updated to the latest leidenalg (0.8.0) the runtime of calling find_parition() on a graph of around 1M nodes took over an hour compared to the usual 5-10 minutes.

I uninstalled v0.8.0 and reinstalled 0.7.0 (pip install leidenalg == 0.7.0) and found that the runtime when calling find_partition() went back to the fast 5-10 minutes. I'm using python 3.7.
This is how I call the function:
partition = leidenalg.find_partition(G_sim, leidenalg.ModularityVertexPartition,n_iterations =n_iter_leiden,seed=partition_seed)

Is there some parameter I am overlooking that needs to be set when using find_partition() in 0.8.0 ? The function (called from 0.8.0) runs to completion on smaller graphs where the runtime difference is less noticeable.
Thank you

@vtraag
Copy link
Owner

vtraag commented Jun 2, 2020

That is quite disconcerting! What OS are you on? Does it only have this problem on a particular network? I will run some further tests on my side.

@vtraag
Copy link
Owner

vtraag commented Jun 2, 2020

@iosonofabio, could you perhaps also take a look? The part that is most likely affecting the runtime is the idea of keeping some nodes "fixed". Did you check the impact on the runtime? If the impact is too great, I might need to take it out again. Perhaps some part can be improved still, perhaps you can take a closer look?

@ShobiStassen
Copy link
Author

Hi again,

I ran 0.8.0 on 3 different datasets (1.3 Million Mouse Neuron from 10X, first 20 PCs, 1.1M lung cancer cells with 24 features and some other single-cell large datasets).
The input to leidenalg is a KNN-Graph of the original data.
I'm running on Ubuntu 16.04, Intel® Xeon(R) W-2123 CPU @ 3.60GHz × 8 , 126 GB Ram.

@ShobiStassen
Copy link
Author

@vtraag I should also take this opportunity to thank you for a wonderful algorithm!

@iosonofabio
Copy link
Contributor

I'll have a look. I need that though so if it goes out I'll fork the repo.

@vtraag
Copy link
Owner

vtraag commented Jun 2, 2020

Thanks @ShobiStassen! @iosonofabio, I'd be happy to keep the feature in. But if it has such a large effect on the runtime, it will be causing too much downsides to keep it in in its current form. We can talk about the details of changing the implementation after we have taken a closer look. Perhaps we come to the conclusion that it is better to have your own fork (sort of northstar-leiden), but perhaps we find a way to keep it in, without the large runtime effects.

@iosonofabio
Copy link
Contributor

@vtraag my thoughts exactly

@iosonofabio
Copy link
Contributor

iosonofabio commented Jun 2, 2020

Also, @ShobiStassen: is Leiden the bottleneck in runtime in your analysis? In my experience:

  1. even though Leiden might be a little slower, it is by far not the bottleneck
  2. Most datasets are much smaller than 1.1M cells. In fact, collecting that amount of data costs around half a million dollars, so it's something only dedicated labs will do.

That said, of course we should try to be as efficient as possible.

@ShobiStassen
Copy link
Author

@iosonofabio hi, the runtime for the knngraph for these datasets takes around 5 mins, so when using v0.8.0 of Leiden, the Leiden step is the bottleneck

@ShobiStassen
Copy link
Author

I've also run this on some older cytometry datasets of around 300,000 cells and the v0.7.0 takes around two mins whereas I terminated the Leiden after around 15mins for v0.8.0

@vtraag
Copy link
Owner

vtraag commented Jun 2, 2020

Perhaps we should just use a common benchmark to check the speed of leidenalg (and only leidenalg). Perhaps we can use the dataset linked by @ShobiStassen. Perhaps you can provide some code as to how to create the exact graph you are running leidenalg on, @ShobiStassen?

@iosonofabio
Copy link
Contributor

@vtraag That's a great idea, it will help us debug. Also a smaller version that takes only 10 seconds would help

@ShobiStassen usually there is a lot of preprocessing for things like dimensionality reduction, feature selection, normalization etc etc. that's where the time usually goes. But that's out of the scope of this issue: we should just try and see what the problem is

@ShobiStassen
Copy link
Author

ShobiStassen commented Jun 2, 2020 via email

@iosonofabio
Copy link
Contributor

iosonofabio commented Jun 2, 2020

Awesome thanks! Btw you making the graph using this method?

https://ieeexplore.ieee.org/document/8594636

That might be more scalable than naive O(N^2) approaches, which would explain why Leiden becomes the bottleneck. Another way of approximating that graph that scales to 100 million cells - or high-dimensional vectors - is lshknn which I wrote based on Paolo Carnevali at CZI (https://github.com/chanzuckerberg/ExpressionMatrix2)

@ShobiStassen
Copy link
Author

ShobiStassen commented Jun 2, 2020 via email

@ShobiStassen
Copy link
Author

@iosonofabio cool ! I will check out the graph approx methods , I use hnsw to make the knngraph and then do some pruning before inputting to Leidenalg(on my GitHub repo for PARC clustering on large scale single cell data).

@iosonofabio
Copy link
Contributor

iosonofabio commented Jun 3, 2020

Alright, I took a knn graph from a typical single cell dataset (kidney droplet data from Tabula Muris Senis, https://www.biorxiv.org/content/10.1101/661728v1). The graph has 11,666 vertices and 93,030 edges (unweighted, undirected). As a fun side note, I was one of the people preparing the experimental sample as well ^^

I tested two versions of leidenalg:

  • the commit before my fixed node branch was merged first (notice there were two merges) (defe55a)
  • the commit after my fixed node branch was merged last (1aa7efc)

Both were linked against current igraph 0.8.2.

Results here below as cumulative distributions for 30 runs of each version (NOTE: the y axis should say "fraction of runs slower than x):
leidenalg_profiling

The machinery of fixed nodes slows it down by 5% on average. As a comparison, the stochastic nature of Leiden causes a 3% standard deviation on runtime.

This indicates that the fixed nodes do not have a strong impact on performance, which was already my impression back then when I was developing this.

Here the edge list (tab separated, each line an edge) for your own tests:

example_edges.txt

What's the conclusion? I think it's possible that one of the following happened:

  1. igraph 0.7 -> 0.8 caused the slowdown (if so, we should find it and fix it)
  2. another commit on leidenalg caused the slowdown (if so, which one?)
  3. it only manifests on extremely large graphs (if so, why? 100k edges is already pretty large)
  4. typos on the user side, e.g. comparing two slightly different workflows

@ShobiStassen could you try how long it takes to cluster the graph attached with CPM? This is on my low-voltage CPU laptop so it should be around that time or faster. This is how I do the timing:

image

@ShobiStassen
Copy link
Author

Hi again,
I ran your code for the kidney graph as well as a larger graph
70K_LC_edgelist.txt
of Lung Cancer cells and get the following runtimes. I only ran each dataset 10 times for each version of Leidenalg (080, and 070). I used the same conda environment but just pip uninstalled and reinstalled the required version of Leidenalg leaving everything else the same (the version of python igraph in this environment is 0.8.2).
The larger graph has 70,000 nodes and 2.1Million edges. As you can see from the output, the runtime is about 2x for the newer leidenalg on this graph.

image

Here is how i read the file:
kidney_g = ig.Graph.Read_Ncol('/home/shobi/Downloads/example_edges.txt', directed=False)

I use the same function calls as you:
`

    partition = leidenalg.CPMVertexPartition(kidney_g, resolution_parameter=0.01)
    opt = leidenalg.Optimiser()
    opt.optimise_partition(partition)

    #partition = leidenalg.find_partition(kidney_g,leidenalg.ModularityVertexPartition)
    t1 = time.time()
    dt = t1-t0`

@iosonofabio
Copy link
Contributor

Thanks for that. It's interesting because:

  1. The larger graph seems to suffer more
  2. Even the smaller graph has a 60% runtime increase for you, but only a 5% increase for me
  3. Your runtimes are generally slower than mine, which is unexpected because I'm on a slow laptop CPU.

@vtraag any ideas? Notice that the comparison is a little tricky because @ShobiStassen compared 0.7.0 versus 0.8.0, whereas I surgically extracted the exact commits around the fixed nodes, which is a slightly different thing.

@iosonofabio
Copy link
Contributor

iosonofabio commented Jun 3, 2020

I added the commits tagged 0.7.0 and 0.8.0 to the benchmark:

leidenalg_profiling2

Clearly something happened between the fixed node merge and the released 0.8.0.

@vtraag since you are more familiar with the repo, would you mind doing a little bit of bisectioning? Otherwise I can do it too if you tell me the hashes of commits you would like to test.

As a side note, recent versions don't compile here because line 498 of the setup.py uses detected before it's ever defined. I set it to false since the header path and solib name were already correct for Arch Linux.

@vtraag
Copy link
Owner

vtraag commented Jun 3, 2020

Thanks for looking into this, both of you. I will try to see what other things could be causing this.

Possibly, it may be an issue with optimization flags, I'm not sure. @iosonofabio, are you compiling each version yourself, or not?

recent versions don't compile here because line 498 of the setup.py uses detected before it's ever defined.

That is something that should then also affect python-igraph. Could you open an issue for that so that we don't loose track?

I set it to false since the header path and solib name were already correct for Arch Linux.

I'm not sure I understand: you mean you are compiling & linking against an external version of igraph instead of the vendored source?

@iosonofabio
Copy link
Contributor

iosonofabio commented Jun 3, 2020

I think that I'm compiling against the external shared library of igraph, since a typical call during the python setup.py build is:

gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fno-semantic-interposition -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fPIC -Iinclude -I/usr/include/igraph -I/usr/local/include/igraph -I/usr/include/python3.8 -c src/python_optimiser_interface.cpp -o build/temp.linux-x86_64-3.8/src/python_optimiser_interface.o

(notice the -I/usr/include/igraph). It also goes in a few seconds, which I doubt would be the case if the whole igraph was compiled.

Yes I'm compiling everything (except igraph I think). I do this:

git checkout <commit hash>
python setup.py build
mv build/lib.linux-x86_64-3.8/leidenalg build/lib.linux-x86_64-3.8/leidenalg_before

and similar for the other ones. Then in the script:

sys.path.insert(0, os.path.dirname(os.path.dirname(__file__))+'/build/lib.linux-x86_64-3.8')
import leidenalg_before
import leidenalg_after
import leidenalg_070
import leidenalg_080
modules = {
        'before': leidenalg_before,
        'after': leidenalg_after,
        '0.7.0': leidenalg_070,
        '0.8.0': leidenalg_080,
        }

so I can test them all in the same go.

@vtraag
Copy link
Owner

vtraag commented Jun 3, 2020

OK, thanks for sharing your setup @iosonofabio. In fact, there were a couple of additional commits necessary to make everything run correctly beyond commit 1aa7efc that you tested. I think the implementation of the fixed nodes should be complete after commit
248cce7. Could you perhaps test that?

@iosonofabio
Copy link
Contributor

leidenalg_profiling3
Couldn't be more excruciating ;-)

@vtraag
Copy link
Owner

vtraag commented Jun 3, 2020

OK, thanks for checking! I have an idea what might be causing the slowdown, I'm checking it now.

@ntamas
Copy link

ntamas commented Jun 3, 2020

Just wanted to chime in to say I'm following things - this is better than investigative journalism :) Keep up the good work!

@vtraag
Copy link
Owner

vtraag commented Jun 3, 2020

@iosonofabio, @ShobiStassen , I think I found the problem. For some reason I thought something was wrong previously (when in fact it was not), which I believe to be responsible for the large difference in runtimes. Could you please check the solution in PR #36? It will not be exactly on par with the version 0.7.0, because there were actually some things that did need to be corrected, but it should be more in line with only a few % slower runtimes.

@ragibson
Copy link
Contributor

ragibson commented Jun 13, 2020

I ran into this while testing another issue.

For what it's worth, I found that 75% (!) of the optimization time on a 100 layer temporal network was coming from a single C++ iterator in MutableVertexPartition::cache_neigh_communities, so #36 is definitely on the right track.

@ragibson
Copy link
Contributor

Yes, the fix in #36 is more than twice as fast as the code in the master branch for this multilayer network.

@vtraag
Copy link
Owner

vtraag commented Jun 13, 2020

Thanks @ragibson ! Could you also confirm the improvement @iosonofabio and @ShobiStassen ? I will then merge and release a new version.

@iosonofabio
Copy link
Contributor

yep

leidenalg_profiling4
leidenalg_profiling4_density

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants