-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelism safety (thread, multiprocessing) #329
Conversation
Note that there is a minor display glitch: when the first bar reaches 100% and This is because of this line of code and because the I did not want to remove this here without a discussion, this should be done in another PR. What do you think @casperdcl ? If we remove this line of code, users will have to |
I can't get a unit test to work, simply because I can't find a way to get the printed result. @casperdcl if you have any idea how to get sys.stderr output from forked children (maybe you can reuse your tricks that you used on the CLI tests?)... Here is my unit test so far (yes progresser() and init_child() need to stay outside of the test):
|
th_lock = th.Lock() # thread lock | ||
|
||
|
||
class TqdmDefaultWriteLock(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Tqdm
prefix needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just to follow the nomenclatura we had (eg, TqdmDeprecationWarning).
@@ -43,6 +46,34 @@ def __init__(self, msg, fp_write=None, *a, **k): | |||
super(TqdmDeprecationWarning, self).__init__(msg, *a, **k) | |||
|
|||
|
|||
# Create global parallelism locks to avoid racing issues with parallel bars | |||
# works only if fork available (Linux, MacOSX, but not on Windows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macOS now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to sound pushy, but I think the PyCharm editor is superior to Notepad++ - I can't vy for the other IntelliJ IDEs, but PyCharm is good. For example, flake8
is automatically run on files and cmd-shift-l fixes errors; there's also automatic variable renaming.
tqdm/_tqdm.py
Outdated
1 / self.avg_time if self.avg_time else None, | ||
self.bar_format)) | ||
# Print bar's update | ||
self.sp(self.format_meter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spacing here is not so good.
Thanks @CrazyPython , I will implement your fixes soon. About PyCharm, I need something very lightweight as I have hundreds of tabs open at once! |
Is this good to go apart from the stylistic issues? I can fix those. |
Yes it works well. Sorry about the stylistic issues, I can't fix them right
now, so if someone else want to do it, please feel free, thanks a lot!
2017-01-24 1:34 GMT+01:00 Greg Werbin <notifications@github.com>:
… Is this good to go apart from the stylistic issues? I can fix those.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#329 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABES3nR4wOPvIWyEnyRm3f_dRRlcLMPWks5rVUb5gaJpZM4LXJqv>
.
|
@lrq3000 checkout The Great Suspender then. It lowers the memory usage of tabs by suspending them |
Ah I already use it, it's an amazing plugin!
But I meant that I don't have much time right now :)
2017-01-25 1:42 GMT+01:00 CrazyPython <notifications@github.com>:
… @lrq3000 <https://github.com/lrq3000> checkout The Great Suspender
<https://chrome.google.com/webstore/detail/the-great-suspender/klbibkeccnjlkjkiokjodocebajanakg?hl=en>
then. It lowers the memory usage of tabs by suspending them
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#329 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABES3pG1I0pDWs8pEHEOcg4DYerycHlEks5rVpp6gaJpZM4LXJqv>
.
|
a few thoughts
(1) def progresser(n):
text = "bar{}".format(n)
for i in tqdm(range(5000), desc=text, position=n, leave=True):
sleep(0.001)
def init_child(write_lock):
"""
Provide tqdm with the lock from the parent app.
This is necessary on Windows to avoid racing conditions.
"""
tqdm.set_lock(write_lock)
if __name__ == '__main__':
freeze_support()
write_lock = Lock()
L = list(range(10))
Pool(len(L), initializer=init_child, initargs=(write_lock,)).map(progresser, L) could be replaced with: (2) def progresser(write_lock, n):
tqdm.set_lock(write_lock)
text = "bar{}".format(n)
for i in tqdm(range(5000), desc=text, position=n, leave=True):
sleep(0.001)
if __name__ == '__main__':
freeze_support()
write_lock = Lock()
L = list(range(10))
procs = [Process(target=progresser, args=(write_lock, i)) for i in L]
[p.start() for p in procs]
[p.join() for p in procs] which in turn becomes: (3) def progresser(lk, n):
lk.acquire()
with tqdm(total=5000, desc="bar " + str(n), position=n) as t:
lk.release()
def update():
lk.acquire()
t.update()
lk.release()
for i in range(5000):
sleep(0.001)
update()
if __name__ == '__main__':
freeze_support()
write_lock = Lock()
L = list(range(10))
procs = [Process(target=progresser, args=(write_lock, i)) for i in L]
[p.start() for p in procs]
[p.join() for p in procs] Incidentally, the last example is how I've been using |
@casperdcl I agree it can be done from outside, but then we're outsourcing the complexity of using So yes, the PR provides a simplified and unified API to support parallelism, although it does not provide all functionalities (notably |
PS: also with solution 3, we lose speed advantage since we cannot support iterable-based Just my two cents anyway, I don't use parallelism much. |
I guess. Maybe need more feedback from people who do use parallel code a lot whether they have any suggestions/preferences. btw, the random places tqdm finds itself... https://media.readthedocs.org/pdf/bigartm/latest/bigartm.pdf (dependencies: boost, numpy, pandas, protobuf... and tqdm). |
Agreed, let's wait for more feedback, or if anyone has any better idea (or
if someone can finish the centralized manager).
Thanks for the share, this seems to be a very interesting library, I might
use it! Glad that tqdm could be useful to them :)
2017-01-29 19:51 GMT+01:00 Casper da Costa-Luis <notifications@github.com>:
… I guess. Maybe need more feedback from people who do use parallel code a
lot whether they have any suggestions/preferences.
btw, the random places tqdm finds itself... https://media.readthedocs.org/
pdf/bigartm/latest/bigartm.pdf (dependencies: boost, numpy, pandas,
protobuf... and tqdm).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#329 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABES3qJI9Xhv9EWAAqM2_JmzQtyWTAlLks5rXN-jgaJpZM4LXJqv>
.
|
i'm a bit of a novice so this is just my 2¢ ... but I usually use multiprocess in python concurrency b/c my tasks are usually cpu bound. normally I either subclass mulitprocess.Process and override the from multiprocessing import Process, Queue
class my_worker(Process):
def __init__(self, output_queue, chunk_to_do):
super().__init__()
self.output_queue = output_queue
self.chunk_to_do = chunk_to_do
def run(self)
for thing in chunk_to_do:
self.output_queue.put(hard_math(thing))
if __name__ == "__main__":
q = Queue()
def chunker(longlist):
yield chunk
workers = []
for chunk in chunker(longlist):
p = my_worker(q,chunk)
p.start()
workers.append(p)
for worker in wokers:
worker.join()
# do stuff with data... or lately i've been fooling around with the high-level wrapper library the way i've been using it is along the lines of: def hard_math(arg1,arg2...):
return result
if __name__ == "__main__":
to_proc = [x for x in things_to_do_math_on]
CHUNKSIZE = len(to_proc) // N
with ProcessPoolExecutor(max_workers=N) as e:
emap = e.map(functools.partial(hard_math,arg2=arg2),to_proc,chunksize=CHUNKSIZE)
output = list(emap)
# do stuff with data ' |
Can we get this merged in? |
Thanks a lot @casperdcl :-) Maybe you should double check the two code blocks I commented in this commit, I am not sure what is the best thing to do about it, if you already tested the cases I mention then it should be fine :-) |
94cba8f
to
d4b942d
Compare
partially reverts 9b7043c info: #329 (review)
partially reverts 9b7043c info: #329 (review)
ab5d63e
to
4278974
Compare
pass | ||
cls.monitor = None | ||
else: | ||
cls.monitor = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrq3000 happy with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes :-) thanks!
@lrq3000 it would be great if you could post a quick unit test as you described:
examples/parallel_bars.py should be a decent start. Coverage is still currently 100% but that doesn't prove it always works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
pass | ||
cls.monitor = None | ||
else: | ||
cls.monitor = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes :-) thanks!
@casperdcl I would love to but unfortunately I don't have the time right now, sorry :-( Also I'm not sure that |
@Illuminae Which OS? are you using a specific terminal? (e.g. PyCharm has problems) Everyhing which follows was done on Win10 64bit and i will dig deeper at the weekend, i dont have the time currently. So here just a small statement. I think they are not the result of this PR. (But idk so i will point it out here.) from time import sleep
from tqdm import trange
from multiprocessing import Pool, freeze_support, Lock
L = list(range(9))
def progresser(n):
interval = 0.001 / (n + 2)
total = 5000
text = "#{}, est. {:<04.2}s".format(n, interval * total)
for i in trange(total, desc=text, position=n):
sleep(interval)
if __name__ == '__main__':
freeze_support() # for Windows support
p = Pool(len(L),
# again, for Windows support
initializer=tqdm.set_lock, initargs=(Lock(),))
p.map(progresser, L)
print("\n" * (len(L) - 2)) With the Windows Power Shell i get those results or similiar:
and command line
I replaced the "tofu" with a question mark Interesting here is that one bar get doubled, the second bar has not a full bar and inside the power shell the directory get printed into the last bar. EDIT: i just saw it could be issued here: #445 ? PS. pressing Ctrl+C leads into a KeyboeadInterrupt and stays there. Spamming it results into a |
@IceflowRE I was using the regular Mac OS X Terminal, the server I was executing on runs Debian 8.9. |
@Illuminae @IceflowRE see #454, maybe you just need |
with patched example/parallel_bars.py, (make the top bar run faster and thus finish earlier). diff --git a/examples/parallel_bars.py b/examples/parallel_bars.py
index 98eb06e..04698a8 100644
--- a/examples/parallel_bars.py
+++ b/examples/parallel_bars.py
@@ -8,7 +8,7 @@ L = list(range(9))
def progresser(n):
- interval = 0.001 / (n + 2)
+ interval = 0.001 / (len(L) - n + 2)
total = 5000
text = "#{}, est. {:<04.2}s".format(n, interval * total)
for _ in tqdm(range(total), desc=text, position=n): still see the same problem as @IceflowRE. and here is my running env,
|
Implement parallelism safety in
tqdm
by providing a newset_lock()
class method. This is a follow-up on #291.For Linux (and any platform supporting
fork
), no action is required from the user.For Windows, here is the canonical example usage:
Todo:
mp.set_start_method('spawn')
to force mimicking Windows "no-fork" spawning of processes. Else it's impossible to test on Linux.tqdm.write()
won't work, because there is no way to implement it without a centralized manager that is aware of all bars. See Multi tqdm #143 for a possible solution. Or maybe we can change_instances
type to amultiprocessing.Queue()
(or another type that is shareable across multiprocesses)?