Conversation
…uential vs thread pool execution yields the same result
Codecov Report
@@ Coverage Diff @@
## master #9 +/- ##
==========================================
+ Coverage 73.68% 76.94% +3.26%
==========================================
Files 5 5
Lines 285 334 +49
==========================================
+ Hits 210 257 +47
- Misses 75 77 +2
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #9 +/- ##
==========================================
+ Coverage 73.68% 77.21% +3.53%
==========================================
Files 5 5
Lines 285 338 +53
==========================================
+ Hits 210 261 +51
- Misses 75 77 +2
Continue to review full report at Codecov.
|
graphkit/base.py
Outdated
| class NetworkOperation(Operation): | ||
| def __init__(self, **kwargs): | ||
| self.net = kwargs.pop('net') | ||
| self.execmethod = None |
There was a problem hiding this comment.
maybe something like:
self._execmethod = kwargs.get('execmethod', None)
making execmethod a "private" attribuet
| self.net = kwargs.pop('net') | ||
| self.execmethod = None | ||
| Operation.__init__(self, **kwargs) | ||
|
|
There was a problem hiding this comment.
and then also add a function to set execmethod explicitely?
def set_execmethod(method):
options = ['parallel', 'sequential']
assert(method in options)
self._execmethod = method
graphkit/network.py
Outdated
| from multiprocessing.dummy import Pool | ||
|
|
||
| # if we have not already created a thread_pool, create one | ||
| if not hasattr(self, "_pool"): |
There was a problem hiding this comment.
where could the _pool attribute have been initialized? will this always be initialized on the first run?
There was a problem hiding this comment.
Yea I was thinking of mainly doing this on first run because I wasn't able to think of a common use case where we'd want to have a different behavior.
graphkit/network.py
Outdated
| A boolean indicating whether the operation may be scheduled for | ||
| execution based on what has already been executed. | ||
| """ | ||
| dependencies = set(v for v in nx.ancestors(graph, op) if isinstance(v, Operation)) |
There was a problem hiding this comment.
same thing, but more explicit, that it's only filtering.
dependencies = set(filter(lambda v: isinstance(v, Operation), nx.ancestors(graph, op))
| if len(upnext) == 0: | ||
| break | ||
|
|
||
| done_iterator = pool.imap_unordered( |
There was a problem hiding this comment.
Do I understand correctly, that this is layerwise parallelism? as in, if at a certain depth in the graph, we have a pool of 5 workers and 1 worker takes a long time, then the others have to wait?
There was a problem hiding this comment.
Yes, your description is correct. The current scheduling method for the parallel thread pool is to "schedule everything that is able to be scheduled within this current iteration" . In the same iteration, wait for all answers to come back, and then repeat the process. This may some times be inefficient when we have for example 1 task that takes very long and many tasks that take a short amount of time.
There is another way to do the scheduling which could potentially be more efficient, but I have not figured out how to implement it yet. The idea would be to schedule operations that are "schedulable" as soon as there is a free worker slot.
There was a problem hiding this comment.
ok. thanks for clarifying :) you're right, no idea how that would work using threads from within python. the pattern i'd use other places would be to have the outermost loop iterate over threads. whenever there is a job available from the job queue, run it. idk whether you can address threads like that with multiprocessing.
|
Here's the updated usage string: |
|
awesome :) looks good! i think the point further up about switching the parallelism mechanism can be pushed to the backlog for now? |
|
cool yea, the approach you mention is what we want to get to. It was a little tricky figuring out how to write it in python, but we can implement it as a more efficient parallelism method down the line. |
@tobibaum
usage: