Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running UMAP hangs #53

Closed
ts387 opened this issue Apr 21, 2021 · 10 comments
Closed

Running UMAP hangs #53

ts387 opened this issue Apr 21, 2021 · 10 comments

Comments

@ts387
Copy link

ts387 commented Apr 21, 2021

Hi Ellen,

I have had this issue from the start but have just been going along with PCA. Unless I skip running umap, all my cryodrgn analyze runs appear to hang at this stage (i.e. till the job exits on hitting the cluster time limit). Our system admin tells me the job isn't doing anything, and no errors pop up in the log either:

(cryodrgn) -bash-4.2$ tail -f CryoDRGN-01_vae128_big_z8-analyze_umap.out
2021-04-21 18:36:24 Saving results to /home/ts387/CryoDRGN/01_vae128_big_z8/analyze.24
2021-04-21 18:36:24 Perfoming principal component analysis...
2021-04-21 18:36:24 Explained variance ratio:
2021-04-21 18:36:24 [0.14331131 0.13593155 0.13252919 0.12811412 0.12108949 0.11471545
0.11343906 0.11086983]
2021-04-21 18:36:24 Generating volumes...
2021-04-21 18:36:24 K-means clustering...
2021-04-21 18:36:32 Generating volumes...
2021-04-21 18:36:32 Running UMAP...

I am looking at ~150,000 particles. Can you tell what might be going on, and if there's a solution or workaround? Just in case, I am attaching my slurm script as a text file. I apologise if the question is too tangential to core cryodrgn functionality.

Thanks a million!

Taha

cryodrgn analyze (--skip-vol).txt

@zhonge
Copy link
Collaborator

zhonge commented Apr 27, 2021

Very strange -- Can you try running umap on a subset of your dataset? There is a helper script in the repository, which can stride the dataset to test on ~150 datapoints:

$ python /path/to/repo/analysis_scripts/run_umap.py z.pkl --stride 1000 -o test_umap.pkl

@ts387
Copy link
Author

ts387 commented Apr 28, 2021

Hi, that does seem to work...

(cryodrgn) -bash-4.2$ python run_umap.py z.pkl --stride 1000 -o test_umap.pkl
(152, 8)

...and I get a 1.4 kb UMAP pickle file as a result. I ran it too without stride, which works too.

Is there a way I may use it as a substitute for the UMAP subroutine in cryodrgn analyze (for now)? So I could see UMAP PNGs marking k-means cluster centres for sampled densities; and subsequently also interact with the raw UMAP output using the Jupyter notebook. That would be super useful!

Thank you very much.

@ts387
Copy link
Author

ts387 commented Apr 29, 2021

I copied the (full set) umap.pkl file to the analysis directory, and am able to see the UMAP visualisations through Jupyter notebook.

Some of the widgets, in particular the latter interactive ones don't seem to work. However, I gather this is a known issue and partially fixed in v0.3.2?

You can close the issue after this (thanks for all your help so far — cryoDRGN is already proving to be a real asset to our projects!)

@Guillawme
Copy link
Contributor

Some of the widgets, in particular the latter interactive ones don't seem to work. However, I gather this is a known issue and partially fixed in v0.3.2?

This was #34 and got fixed in a way, but you cannot have gotten the fix automatically by updating since it was about which dependencies are installed in your conda environment. To benefit from this fix, you need to reinstall in a fresh environment, making sure you follow the directions in the README.

@zhonge
Copy link
Collaborator

zhonge commented May 1, 2021

Looking into the umap issue more -- I was able to reproduce UMAP hanging in a different installation environment (python=3.7, pytorch=1.7, umap=0.4.2, numba=0.47.0, ...)

In fact, it hangs even when running the basic example:

import umap
from sklearn.datasets import load_digits

digits = load_digits()

embedding = umap.UMAP().fit_transform(digits.data)

which only takes a few seconds in my previous installation with python=3.6, pytorch=1.1, umap=0.4.1, numba=0.48.0...

I think this is related to an underlying dependency issue in the umap package specifically with numba=0.47.0:
lmcinnes/umap#336

Can you check your numba version (conda list numba) and try installing a different version of numba?

@ts387
Copy link
Author

ts387 commented May 2, 2021

Thanks @Guillawme.

@zhonge Our cryodrgn environment is operating python 3.7.9, pytorch 1.0.0, umap-learn 0.5.1 and numba 0.52.0. Shall we consider backdating numba then?

@zhonge
Copy link
Collaborator

zhonge commented May 2, 2021

Does the basic example work for you? You just need to copy the above lines into a python session on the computer that you're testing.

@ts387
Copy link
Author

ts387 commented May 4, 2021

You mean line-by-line copy and compile each statement? I did so, it takes a few seconds following the first and last line – but no hang.

(cryodrgn) -bash-4.2$ python
Python 3.7.9 (default, Aug 31 2020, 12:42:55) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import umap
>>> from sklearn.datasets import load_digits
>>> digits = load_digits()
>>> embedding = umap.UMAP().fit_transform(digits.data)
>>> exit()
(cryodrgn) -bash-4.2$

@zhonge
Copy link
Collaborator

zhonge commented May 19, 2021

I was able to reproduce this on another machine, where running UMAP alone runs fine, but running UMAP during cryodrgn analyze hangs. There is some version/dependency incompatibility between pytorch/numpy/umap, and I can reproduce in a standalone python environment, if I import pytorch AND numpy before importing umap.

# runs fine
(cryodrgn) $ python
>>> from cryodrgn import analysis, utils
>>> z = utils.load_pkl('z.20.pkl')
>>> analysis.run_umap(z[::100])
array([[8.358947 , 9.226697 ],
       [4.5772686, 6.358181 ],
       [3.9687192, 6.013609 ],
       ...,
       [7.218935 , 9.058749 ],
       [3.5549293, 6.595441 ],
       [5.7769523, 8.985867 ]], dtype=float32)

# segfaults if you import torch first...
(cryodrgn) $ python
>>> import torch
>>> from cryodrgn import analysis, utils
>>> z = utils.load_pkl('z.20.pkl')
>>> analysis.run_umap(z[::100])
Segmentation fault (core dumped)

# hangs indefinitely if you import torch then numpy...
(cryodrgn) $ python
>>> import torch
>>> import numpy as np
>>> from cryodrgn import analysis, utils
>>> z = utils.load_pkl('z.20.pkl')
>>> analysis.run_umap(z[::100]) # hangs indefinitely

This particular environment has a very old version of pytorch 1.0.1, numpy 1.20.1, numba 0.51.2, and umap-learn 0.5.1.

I am not sure what the underlying incompatibility is right now, but you can avoid the conflicting imports and get cryodrgn analyze to complete successfully if you call the analyze.py command directly:

# instead of cryodrgn analyze...
(cryodrgn) $ python /path/to/repo/cryodrgn/commands/analyze.py [workdir] [epoch]

@ts387
Copy link
Author

ts387 commented May 24, 2021

That works - thank you very much!

@zhonge zhonge closed this as completed May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants