-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
per-worker contrib.concurrent
progress
#973
Comments
I'm sorry but I don't understand. Is this a question? Is it a statement? A bug report? A feature request? It would be helpful if you could say something like: code: from tqdm.contrib.concurrent import process_map
process_map(foo, L, max_workers=16, chunksize=len(L) // 16) current output:
expected output:
|
Oh, sorry! If I do this:
the progressbar stays at 0% then jumps to 100%. This occurs because the wrapper (and the method in the documentation) wraps the returning futures in the tqdm progressbar. However, there are workloads that work best if n futures are spawned and they work on large chunks. If the workload across workers is pretty consistent, then they all terminate at the same time, causing the progress bar to jump from 0% -> 100%. I would like to update the progress bar from within a future to show how the per-worker progress is going. |
I'm running into this same thing to the extent that I thought tqdm was broken. I have a chunksize of 1,000, so that means that the progress bar only updates when a worker completes 1,000 items. That takes awhile and in the meantime, it looks like tqdm is doing nothing, because I know the system is doing work (thanks |
contrib.concurrent
progress
ah well if the number of running processes/threads == the number of tasks, then yes all tasks will finish simultaneously. Nested progress (each process/thread reports its own progress) needs to be manually implemented by users. |
Is there an example on how to implement this manually? I'm running into the same chunksize problem |
I have tried this:
2023-01-26.11.36.23.movimport time
from tqdm import tqdm
from tqdm.contrib.concurrent import thread_map
def sleep(duration: int) -> None:
for _ in tqdm(range(duration), leave=False):
time.sleep(1)
if __name__ == "__main__":
thread_map(sleep, list(range(10)), max_workers=4, leave=False, desc="[ROOT]")
2023-01-26.11.38.58.movimport time
from tqdm import tqdm
from tqdm.contrib.concurrent import process_map
def sleep(duration: int) -> None:
for _ in tqdm(range(duration), leave=False):
time.sleep(1)
if __name__ == "__main__":
process_map(sleep, list(range(10)), max_workers=4, leave=False, desc="[ROOT]") |
Actually one can simply monkeypatch the internal _executor_map function and using
|
read the [known issues]
environment, where applicable:
There is a gap in the documentation regarding how to us tqdm with concurrent future ProcessPools.
The documentation and
contrib.concurrent
apply to the case where multiple workers chip away at a large problem, reflecting a small chunksize. Thus as workers finish, the progress can be updated.However, for the case with a few workers and large chunksize, it is unclear how to update a progress bar from within the future. Specifically, I have a workload where I have perhaps 3 million elements to process, and the processing is compute and IO heavy -- open file and do work on it. Thus, I use large chunksizes O(10,000) so-as to have the most efficiency (limits how many times the file gets opened).
For example:
The documentation suggests the following within the process call
However this doesn't work as, if
sleep(interval)
is replaced with an actual work load just does the same thingtotal
times too many.The text was updated successfully, but these errors were encountered: