-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficiency on manycore/fabric CPUs #9
Comments
Right now, one can simply stop all workers and restart Lace to change the number of workers. That would not be exactly on the fly, the problem with that is that the task queues need to support this and I imagine it's pretty difficult to untangle all the dependencies between tasks. Lace is not meant for general purpose applications anyway. If someone wants to reimagine the framework or use the work-stealing deque in a framework that supports those things, be my guest. It is meant for research programs that basically run solo on a big machine to do some task, not to run concurrently with other user applications. |
Yep, as I mentioned above, this is a highly underresearched feature. But it is necessary even on big machines (where booked resources provisioned by a hypervisor are fully changeable on-the-fly) - otherwise the running apps need to maintain persistent state themselves which greatly increases their complexity. If you knew about some references or people researching this, I would very much welcome talking to them. |
I have not been active in parallel processing communities for quite a while. I was briefly engaged when I wrote Lace back in 2013 and visited EuroPar, but have not published or engaged since. I can imagine that someone (e.g. a MSc or PhD student) could adapt Lace for on-the-fly addition (trivial) and removal (slightly involved) of workers. There are roughly two ways to deal with worker death. The core issue is that local task variables are on the program stack of the worker, so if the worker is going to die, something has to be done with local variables. One option is to have some way of saving a task state, but that would require significant changes to tasks to the point where one might as well completely replace the framework to allow this kind of synchronization... the other option is to simply "unsteal" the work, but this introduces a large cost when workers extensively steal from each other. This is relatively easy to implement, when dying the worker simply has to "unsteal" all stolen tasks. |
Thanks for the pointers! If I find someone interested, I will contact you. I will leave this thread open as an open topic. |
This is a continuation of the discussion from #10 (comment) to not pollute that PR's thread: I think the proposed two points (i.e. only grow number of threads and let them sleep) is a good one as sleeping threads are IMHO enough efficient and should not pose nearly any harm to the running system (neither power-wise nor performance-wise nor latency-wise even in a long run). So yes, I would be totally in favor of not really removing threads on-the-fly but just let them sleep. Thoughts? |
Let's say some work is being done by 6 out of 8 Lace workers, 2 are sleeping. How would they wake up if work becomes available? Should every worker that pushes new work wake up all potentially sleeping threads? Should they do that each time a task is pushed to the queue? |
I guess not if it is just one task (would this scale up enough?). But if it is a batch of tasks, then probably yes.
If Lace does have unshared per-Lace-worker queues, then I think all these decisions could be made heuristically based on the fill percentage of these queues or better the fill rate of these queues. But again, this is just guess-work. These are just my speculations - it needs real tests of both approaches to decide this. |
Would this help? 67eaa42 It suspends all Lace workers if there is no work, and resumes all Lace workers when work is offered. The assumption anyway is that Lace is used for fine-grained work, so if there is any work, it should be enough to keep all workers busy anyway. |
One of the unsolved questions in the realm of work stealing is the fixed (as opposed to dynamic/elastic) number of workers determined AOT without the chance of allowing to grow/shrink arbitrarily during running time.
Most modern devices have multicore architectures and there is a strong trend to commission manycore systems. There are at least two distinguishable driving forces behind the desire of growing/shrinking the number of workers during running time.
In both cases we have some non-negligible number of running workers (tens, hundreds or even thousands) and we map them to cores. If the number of workers does not match number of cores, context switching will become dominant and the performance of the app significantly decreases (in case of multiple workers being mapped to one low-performance core in response to power saving demand) or the app will not use the full power of the HW (in case some cores were added to the server or got turned on quickly on a smartphone in response to the user starting a game).
What are the plans for Lace to embrace all this?
The text was updated successfully, but these errors were encountered: