Cross validation -- option to control # of threads #15

illdopejake · 2020-11-05T04:10:57Z

Hiya Leon et al.,

I ran into an interesting issue when the sys admin of my cluster reached out about some problematic processes that were instantiated when running a parallelized version of the SuStaIn cross-validation. The wrapper script looked something like this:

from pySuStaIn import Zscore SuStaIn
import multiprocessing as mp

sustain_input = ZscoreSuStaIn(args)
test_idxs = <a list of lists>

jobs = []
NFolds = 10
for fold in range(NFolds):
    p = mp.Process(target = target = sustain_input.cross_validate_sustain_model,
                             args = (test_idxs,fold))
    jobs.append(p)
    p.start()

This script was then submitted to the cluster with an .sh script specifying some parameters, such as the number of nodes and cores (in this case I asked for 1 node and 32 cores). However, it seems that individual jobs were themselves starting several other threads/processes. In this sense, they were overriding the specifications on my .sh script. The result was me asking for 32 cores, but having 32^2 threads running on the node. This results in many context switches and inefficient use of the processors on the node.

I admit this is kind of a niche issue and maybe folks don't care so much about how efficient the code is. But I think this issue might be surmounted quite easily by allowing an argument where the user can control the internal parallelization to some degree, a la the n_jobs framework in sklearn. As is, the parallel qualities do not seem to be controllable by the user.

Forgive me if this isn't clear. Would be happy to provide greater detail!

As always, thanks for such making this amazing library!

The text was updated successfully, but these errors were encountered:

sea-shunned · 2020-11-11T16:12:58Z

Hi Jake,

In your args, do you have use_parallel_startpoints = True? If so, this itself is doing some multiprocessing in the maximum likelihood part, which would explain the exploding threads! Disabling this (use_parallel_startpoints = False) should keep things consistent for your parallelization across folds. This may cause the individual runs to be slower, but doing 10 (in this example) simultaneously should be faster overall than internal parallelization.

If you did set use_parallel_startpoints = False, then...well, I'll have to have a think 😄

A possible route for the future may be, as you say, to embed finer detail (e.g. n_cpus) for each run, rather than a binary serial/parallelize, or just better exposing the options of pathos (which is used underneath), so thank you for bringing this up!

illdopejake · 2020-11-11T16:22:03Z

Hi Cameron,

Thanks so much for looking into this. I appreciate you bringing the use_parallel_startpoints argument to my attention. I hadn't really paid much attention to it, and I'm glad to know about it now. But, in the instance where I encountered the error described in this issue, I actually has use_parallel_startpoints set to False. So there must be somewhere else in the code that is leading to all these processes initiating.

sea-shunned · 2020-11-11T17:11:03Z

That is interesting!

Underneath, numpy does some parallelization that is externally controlled depending on the libraries that the cluster is using, i.e. if it is using BLAS/OpenBLAS, MKL etc. This StackOverflow post explains a possible way to address this, which may be easy or hard to do, depending on the cluster setup!

illdopejake · 2020-11-11T17:30:58Z

Great, I had no idea numpy was doing that sort of thing, I guess because seems like it only becomes relevant with processes on really large arrays? I will give this a try next time and will report back on whether it resolves the issues. Thanks again!

sea-shunned · 2020-11-12T11:08:09Z

Yep, and sometimes parallelization has more overhead cost than the time you save (if the arrays aren't big enough), so it can pay to adjust the settings.

I'll keep this issue open for now — if setting those environment variables or anything else does fix the issue please let us know!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross validation -- option to control # of threads #15

Cross validation -- option to control # of threads #15

illdopejake commented Nov 5, 2020

sea-shunned commented Nov 11, 2020

illdopejake commented Nov 11, 2020

sea-shunned commented Nov 11, 2020

illdopejake commented Nov 11, 2020

sea-shunned commented Nov 12, 2020

Cross validation -- option to control # of threads #15

Cross validation -- option to control # of threads #15

Comments

illdopejake commented Nov 5, 2020

sea-shunned commented Nov 11, 2020

illdopejake commented Nov 11, 2020

sea-shunned commented Nov 11, 2020

illdopejake commented Nov 11, 2020

sea-shunned commented Nov 12, 2020