Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you profile applications that use multiprocessing? #179

Open
fake-name opened this issue Jul 8, 2018 · 5 comments
Open

How do you profile applications that use multiprocessing? #179

fake-name opened this issue Jul 8, 2018 · 5 comments

Comments

@fake-name
Copy link

I have a very, very parallelized project, that uses python multiprocessing extensively.

Is there a canonical way to use vmprof for profiling applications that have more then one process? Right now, you only get the main thread profile data.

I wonder if there's a way to monkey patch multiprocessing to install the profiler in all created processes.

Also, is there a way to join multiple profile output files? It'd be ideal if you could have one file writer process that aggregated all the sampled process at runtime, but I can also see having each thread dump it's own profile log file, that then gets aggregated once your application has completed.

fake-name added a commit to fake-name/ReadableWebProxy that referenced this issue Jul 8, 2018
@fake-name
Copy link
Author

fake-name commented Jul 8, 2018

Oh, additionally, I think that if you attach vmprof to a process, fork using multiprocessing, and then try to attach vmprof to the newly forked process (which is now outside the profiler, I believe), the child thread vmprof.enable() call fails with _vmprof.VMProfError: vmprof is already enabled.

This is easy enough to work around (fork, then attach profiler, rather then attach, and then fork), but it might be a decent idea to have the multiple-enable checking code to see if the value of multiprocessing.current_process() has changed.

(I'm guessing a bit here, since I don't have a stand-alone test case to validate my assumptions, and how vmprof actually behaves under multiprocessing is undocumented.)

@PierreRustOrange
Copy link

I'd be interested if you have any further code on this, I'm facing the same issue when using multi-processing.
From my tests, any new processes created by multiprocess are indeed not monitored.

@fake-name
Copy link
Author

fake-name commented Feb 20, 2019

I wound up finding https://github.com/benfred/py-spy to be pretty handy, albeit it only works for cpython (I use mostly pypy3 these days). You can attach it to any running python process, so I'd just attach it to the python process using the most CPU, and have a look-see.

It's not ideal, but it'd enough to get a rough idea where stuff is.

Writing a extension that hooks atfork() or something is yet another thing I'd like to do if I ever can find the time.


Actually, https://github.com/uber/pyflame claims it supports multithreading, so that's another avenue to investigate.

@PierreRust
Copy link

Thanks for the feedback, I've also been using py-spy, with the same issue concerning pypy ;-)

I've tried pyflame and although is supports multi-thread (like vmprof and py-spy) it does not help with multi-process : forked are completely invisible to it.

@mushan09
Copy link

@PierreRust So how do you solve this problem later? I have no idea about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants