Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PETSC error at cr.tl.initial_states #588

Closed
mehrankr opened this issue May 13, 2021 · 9 comments
Closed

PETSC error at cr.tl.initial_states #588

mehrankr opened this issue May 13, 2021 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@mehrankr
Copy link

mehrankr commented May 13, 2021

I installed cellrank in a new environment in python3.8 using

conda install -c conda-forge -c bioconda cellrank-krylov

I think the recipe needs to be updated to require the latest networkx otherwise paga compatibility breaks with matplotlib error

This installs cellrank 1.3.1 currently and in some of the scvelo and cellrank functions, particularly

cr.tl.initial_states(adata, cluster_key='Cluster', n_jobs=1)

I get the following error:

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18194/18194 [00:19<00:00, 948.84cell/s]
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18194/18194 [00:16<00:00, 1134.85cell/s]
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
WARNING: For 1 macrostate, stationary distribution is computed

This used to happen for:

cr.tl.terminal_states(
            adata, cluster_key='Cluster', weight_connectivities=0.2)

But after changing it to:

cr.tl.terminal_states(
            adata, cluster_key='Cluster', weight_connectivities=0.2,
            model="monte_carlo",
            n_jobs=1, method='brandts', n_states=2)

didn't happen any more.

Very surprisingly, the same issue rises some times (not always) when running:

scv.tl.recover_dynamics(adata, n_jobs=1, n_top_genes=1000)

and

scv.tl.velocity(adata, mode='dynamical')

Versions:

cellrank==1.3.1 scanpy==1.7.2 anndata==0.7.6 numpy==1.20.2 numba==0.53.1 scipy==1.6.3 pandas==1.2.4 pygpcca==1.0.2 scikit-learn==0.24.2 statsmodels==0.12.2 python-igraph==0.9.1 scvelo==0.2.3 pygam==0.8.0 matplotlib==3.4.2 seaborn==0.11.1

@mehrankr mehrankr added the bug Something isn't working label May 13, 2021
@michalk8
Copy link
Collaborator

Hi @mehrankr

I believe this is the same issue as in #473 (not sure why, but in some cases, PETSc parallelization doens't play nicely with they way we parallelize [by default through processes]).
Usually, changing the backend to cr.tl.initial_states(adata, cluster_key='Cluster', n_jobs=1, backend='threading') worked, so I'd try this first.

cr.tl.initial_states(adata, cluster_key='Cluster', n_jobs=1)

Hmm, this should not really happen, esp. for n_jobs=1 (based on #473, this should be fine).

scv.tl.velocity(adata, mode='dynamical')
scv.tl.recover_dynamics(adata, n_jobs=1, n_top_genes=1000)

Very strange, since scvelo doesn't use PETSc; only in 0.2.3, the parallelization was added that we're using here (I assume PETSc has been loaded through cellrank). I will take a closer look at this function to look for problematic parts.

But after changing it to: ...

This is expected, since method='brands' is using scipy under the hood, not PETSc, to get the Schur vectors.

@Marius1311
Copy link
Collaborator

Hi @mehrankr, did these tips help you with your problem already?

@mehrankr
Copy link
Author

mehrankr commented May 17, 2021

Unfortunately no, I'm still getting the same error:

In [337]:         cr.tl.initial_states(
     ...:             adata, cluster_key='Cluster', n_jobs=1,
     ...:                 backend='threading')
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18194/18194 [00:22<00:00, 808.30cell/s]
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18194/18194 [00:18<00:00, 959.66cell/s]
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
WARNING: For 1 macrostate, stationary distribution is computed
WARNING: The following states could not be mapped uniquely: `['lup_1']`

@Marius1311
Copy link
Collaborator

mhm, @michalk8 , could you look into this please?

@mehrankr
Copy link
Author

Thanks for following up. Send me an email and we can arrange for passing you the loom file if needed: mkarimzadeh@vectorinstitute.ai

@michalk8
Copy link
Collaborator

Hi @mehrankr ,

just to be completely sure, does the code above #588 (comment) actually raise some Python exception (or crashes the ipykernel), or does it simply print out to the error to the console?
Because it seems that it just prints the [0]PETSC ERROR right after the progress bar (where joblib does its parallelization) and it seems to have succesfully computed stationary distribution and mapped the cluster label (the 2nd warning regarding lup_1 comes from this call, which is after the stationary dist. has been computed [and therefore after any PETSc usage]).

If it crashes/raises an exception, I will ping you over the email for the data. Lastly, could you please print the output of the following command?

python -c "import petsc4py; import slepc4py; print(petsc4py.__version__); print(slepc4py.__version__)"

@mehrankr
Copy link
Author

Hi @michalk8,

It doesn't crash actually. It simply prints the message out.
As long as you can confirm this warning hasn't affected any of the processes and doesn't affect the results, I think we can close this.

The output is:

python -c "import petsc4py; import slepc4py; print(petsc4py.__version__); print(slepc4py.__version__)"
3.15.0
3.15.0

@michalk8
Copy link
Collaborator

It doesn't crash actually. It simply prints the message out.

Thanks for confirming this. I can see the same error in our CI, as well as in jupyter's log, i.e. the code below:

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed()
cr.tl.terminal_states(adata)
cr.tl.lineages(adata, n_jobs=1, backend='threading')

produces:
petsc_pipe
, and the results are unaffected. I see it printed to the console if using just ipykernel:
petsc_error_2

As long as it doesn't throw an error/crashed the kernel as in #473, it should be fine.

@Marius1311
Copy link
Collaborator

I'm closing, as I think you guys figures out that this is not critical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants