error about mcell_coclust_from_graph_resamp #15

aviezerl · 2019-01-01T03:39:14Z

Original report by JPLau (Bitbucket: 5c2729a816ac1e4f7cbbda0b, ).

Hi,
an error was produced when I run mcell_coclust_from_graph_resamp( about 130k cells), and error is "Too many child processes". However, my station have 2 cpus with 48 cores and 512 GB RAM and I did't notice memory leaks. Also it can not change when I add "options(tgs_max.processes=16L)".

mcell_coclust_from_graph_resamp(
coc_id="fb_raw_filter_coc5000",
graph_id="fb_raw_filter_graph",
min_mc_size=20,
p_resamp=0.75, n_resamp=5000)

aviezerl · 2019-01-01T07:34:03Z

Original comment by Yaniv Lubling (Bitbucket: 557058:11761e0c-f009-41f9-a29c-6f6993e0539c, ).

Hi,

The other parameter that sets the number of processes metacell uses is mc_cores. It is 16 by default, which should be ok on your machine. Did you change it?
you can get it with: n_cores = tgconfig::get_param("mc_cores", "metacell")

As a sanity, let's see if it works when you set both mc_cores and tgs_max.processes to 4.
To set mc_cores, do: tgconfig::set_param("mc_cores", 4, "metacell")

Yaniv

aviezerl · 2019-01-01T16:10:11Z

Original comment by JPLau (Bitbucket: 5c2729a816ac1e4f7cbbda0b, ).

Hi,
I have followed your reply but it runs slowly with 4 threads. I read metacell paper and it says metacell could handle 160k PBMC dataset. Could you upload the R script of 160k dataset processing when it is convenient for you? I want to re run it on my station to check whether the error would produce or not . Thank you.

aviezerl · 2019-01-01T16:19:38Z

Original comment by Amos Tanay (Bitbucket: 557058:5a3503bb-91a6-4af0-972c-ab1d0c3bb50f, GitHub: amostanay).

So if it runs with 4 threads, it is hard to understand why it fails with more, unless there are some machine-specific restrictions . The system should indeed run with 160K cells and more...and we usually use something like 16-24 cores on nodes that look very much like what you described (dual CPU 12x2 physical cores, 512G RAM)

Can you try to increase the number of cores and see where it get blocked? make sure your machine is not loaded by other stuff?

(And thanks for reporting these issues)

aviezerl · 2019-01-01T16:37:22Z

Original comment by JPLau (Bitbucket: 5c2729a816ac1e4f7cbbda0b, ).

without changing mc_cores, I used tgs_max.processes 16, 24 ,48 , and 96 but all produce same error. therefore, I set mc_cores to 4 and tgs_max.processes to 16, the error is produced again. Now, I set 4 to both params, it is slow so I will check it tomorrow and report result.

aviezerl · 2019-01-01T16:42:55Z

Original comment by Amos Tanay (Bitbucket: 557058:5a3503bb-91a6-4af0-972c-ab1d0c3bb50f, GitHub: amostanay).

BTW - why are you using n_resamp=5000? for strater take it down to 100 or 50 - assuming the resampling is the bottleneck you'll know how things look like 100 time faster..

We do not recommend more than 500 bootstrap - even 200 should typically be enough

Amos

aviezerl · 2019-01-01T17:16:29Z

Original comment by JPLau (Bitbucket: 5c2729a816ac1e4f7cbbda0b, ).

maybe I misunderstood this parameter, I thought it related with meta cell number. in 8k pbmc tutorial, n_resample is 500, another tutorial is 1000, so I select a larger number. maybe it is the reason why the error produced.

BTW, n_resample 2500, 4 for mc_cores and tgsmaxcore produce same error. and I reduced the bootstrap, I will update the result later.

aviezerl · 2019-01-01T17:44:19Z

Original comment by JPLau (Bitbucket: 5c2729a816ac1e4f7cbbda0b, ).

indeed, larger n_resample number leads this error, setting to 1000 will be ok. Thank you~

another question: if there are some small clusters in dataset, does the n_resample parameter affect the metacell precision?

aviezerl · 2019-01-01T19:19:19Z

Original comment by Amos Tanay (Bitbucket: 557058:5a3503bb-91a6-4af0-972c-ab1d0c3bb50f, GitHub: amostanay).

You will not gain more sensitivity from running more boostraps - 500 are absolutely enough.

More boostraps can help making metacell cover of large clusters a bit more accurate - if you are worried about small clusters - these should be detected with high sensitivity even without any boostraping.

Paradoxically - it is easier to define small and tight rare cell types than to describe accurately the subbtle gene expression variation within prevalent cell types..

aviezerl closed this as completed Apr 2, 2019

aviezerl added major bug Something isn't working labels Aug 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error about mcell_coclust_from_graph_resamp #15

error about mcell_coclust_from_graph_resamp #15

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

error about mcell_coclust_from_graph_resamp #15

error about mcell_coclust_from_graph_resamp #15

Comments

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019

aviezerl commented Jan 1, 2019