Thread pool barrier executor by huyng · Pull Request #9 · yahoo/graphkit

huyng · 2017-12-14T20:03:09Z

usage:

pipeline.execmethod = "parallel"
results = pipeline({"x": 10}, ["co", "go", "do"])

…tion classes

…uential vs thread pool execution yields the same result

codecov-io · 2017-12-14T20:06:15Z

Codecov Report

Merging #9 into master will increase coverage by 3.26%.
The diff coverage is 94.33%.

@@            Coverage Diff             @@
##           master       #9      +/-   ##
==========================================
+ Coverage   73.68%   76.94%   +3.26%     
==========================================
  Files           5        5              
  Lines         285      334      +49     
==========================================
+ Hits          210      257      +47     
- Misses         75       77       +2

Impacted Files	Coverage Δ
graphkit/base.py	`77.96% <100%> (+1.18%)`	⬆️
graphkit/functional.py	`93.75% <50%> (+0.16%)`	⬆️
graphkit/network.py	`68.78% <95.74%> (+8.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7039f8...1c71ed4. Read the comment docs.

codecov-io · 2017-12-14T20:30:37Z

Codecov Report

Merging #9 into master will increase coverage by 3.53%.
The diff coverage is 94.73%.

@@            Coverage Diff             @@
##           master       #9      +/-   ##
==========================================
+ Coverage   73.68%   77.21%   +3.53%     
==========================================
  Files           5        5              
  Lines         285      338      +53     
==========================================
+ Hits          210      261      +51     
- Misses         75       77       +2

Impacted Files	Coverage Δ
graphkit/base.py	`79.36% <100%> (+2.57%)`	⬆️
graphkit/functional.py	`93.75% <50%> (+0.16%)`	⬆️
graphkit/network.py	`68.78% <95.74%> (+8.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7039f8...7d63219. Read the comment docs.

tobibaum · 2017-12-14T21:22:41Z

graphkit/base.py

 class NetworkOperation(Operation):
    def __init__(self, **kwargs):
        self.net = kwargs.pop('net')
+        self.execmethod = None


maybe something like:

self._execmethod = kwargs.get('execmethod', None)

making execmethod a "private" attribuet

tobibaum · 2017-12-14T21:32:01Z

graphkit/base.py

        self.net = kwargs.pop('net')
+        self.execmethod = None
        Operation.__init__(self, **kwargs)



and then also add a function to set execmethod explicitely?

def set_execmethod(method): options = ['parallel', 'sequential'] assert(method in options) self._execmethod = method

tobibaum · 2017-12-14T21:34:24Z

graphkit/network.py

+        from multiprocessing.dummy import Pool
+
+        # if we have not already created a thread_pool, create one
+        if not hasattr(self, "_pool"):


where could the _pool attribute have been initialized? will this always be initialized on the first run?

Yea I was thinking of mainly doing this on first run because I wasn't able to think of a common use case where we'd want to have a different behavior.

tobibaum · 2017-12-14T21:37:49Z

graphkit/network.py

+        A boolean indicating whether the operation may be scheduled for
+        execution based on what has already been executed.
+    """
+    dependencies = set(v for v in nx.ancestors(graph, op) if isinstance(v, Operation))


same thing, but more explicit, that it's only filtering.

dependencies = set(filter(lambda v: isinstance(v, Operation), nx.ancestors(graph, op))

okay i'll update.

tobibaum · 2017-12-14T21:41:36Z

graphkit/network.py

+            if len(upnext) == 0:
+                break
+
+            done_iterator = pool.imap_unordered(


Do I understand correctly, that this is layerwise parallelism? as in, if at a certain depth in the graph, we have a pool of 5 workers and 1 worker takes a long time, then the others have to wait?

Yes, your description is correct. The current scheduling method for the parallel thread pool is to "schedule everything that is able to be scheduled within this current iteration" . In the same iteration, wait for all answers to come back, and then repeat the process. This may some times be inefficient when we have for example 1 task that takes very long and many tasks that take a short amount of time.

There is another way to do the scheduling which could potentially be more efficient, but I have not figured out how to implement it yet. The idea would be to schedule operations that are "schedulable" as soon as there is a free worker slot.

ok. thanks for clarifying :) you're right, no idea how that would work using threads from within python. the pattern i'd use other places would be to have the outermost loop iterate over threads. whenever there is a job available from the job queue, run it. idk whether you can address threads like that with multiprocessing.

huyng · 2017-12-15T22:57:06Z

Here's the updated usage string:

pipeline.set_execution_method("parallel")
results = pipeline({"x": 10}, ["co", "go", "do"])

tobibaum · 2017-12-15T23:22:25Z

awesome :) looks good! i think the point further up about switching the parallelism mechanism can be pushed to the backlog for now?
👍

huyng · 2017-12-16T00:22:48Z

cool yea, the approach you mention is what we want to get to. It was a little tricky figuring out how to write it in python, but we can implement it as a more efficient parallelism method down the line.

huy added 4 commits December 13, 2017 16:24

add more informative representations to FunctionalOperation and Opera…

7d05aba

…tion classes

Merge branch 'master' into informative_names

dcd3001

new thread pool barrier execution method; added test to make sure seq…

f5a6ab2

…uential vs thread pool execution yields the same result

make py3 compatible

cadbc2e

make py3 compatible

1c71ed4

tobibaum reviewed Dec 14, 2017

View reviewed changes

huy added 4 commits December 15, 2017 14:28

use set_execution_method function to turn on parallel processing

e58af00

use filter syntax

c81393b

use set_execution_method function to turn on parallel processing

bdc8b1a

use more descriptive name

7d63219

huyng changed the title ~~[wip] thread pool barrier executor~~ Thread pool barrier executor Dec 16, 2017

huyng merged commit d3f8a8a into master Dec 16, 2017

Conversation

huyng commented Dec 14, 2017

Uh oh!

codecov-io commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-io commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huyng Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huyng commented Dec 15, 2017

Uh oh!

tobibaum commented Dec 15, 2017

Uh oh!

huyng commented Dec 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented Dec 14, 2017 •

edited

Loading

codecov-io commented Dec 14, 2017 •

edited

Loading

huyng Dec 14, 2017 •

edited

Loading

huyng commented Dec 16, 2017 •

edited

Loading