-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional real time profiling support. #134
Conversation
6b0d5cb
to
c4bc027
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for your contribution! I like the changes. I will help with Mac OS X and see if I can find some API to the same on Mac.
src/vmprof.h
Outdated
@@ -26,6 +26,7 @@ | |||
#define PROFILE_LINES '\x02' | |||
#define PROFILE_NATIVE '\x04' | |||
#define PROFILE_RPYTHON '\x08' | |||
#define PROFILE_REAL_TIME '\x0f' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"\x0f" is binary "0000 1111", I think it should be "\x10" for binary "0001 0000". If I remember correctly those macros specify bits for flags...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh! Yes, this is wrong, and should be '\x10'
.
src/vmprof_main.h
Outdated
@@ -175,11 +179,43 @@ static PY_THREAD_STATE_T * _get_pystate_for_this_thread(void) { | |||
} | |||
#endif | |||
|
|||
#ifdef VMPROF_LINUX | |||
static void broadcast_signal_for_threads(pid_t pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see opendir in the list of signal safe functions. Neither is readdir_r and atoi. I might be wrong, but we need to double check if they are signal safe! Maybe there is another API that enumerates the thread ids of the process without asking the disk!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good call. In turn:
- What can we do if they're not signal safe?
- I've actually tried to find another API for thread enumeration and failed, I don't like the readdir solution, either. (Although technically it's asking the kernel, not the disk.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, true the kernel delivers that info. If they calls are not signal safe, the program most likely freezes or crashes.
We could think about hooking into thread creation, that would register/unregister the thread id. Though a solution asking the kernel would be much better.
src/vmprof_main.h
Outdated
// For the real timer, the signal gets delivered to the main thread, seemingly always. | ||
// Consequently if we want to sample all threads, we need to forward the signal with tgkill. | ||
if ((signal_type == SIGALRM) && (pid == tid)) { | ||
broadcast_signal_for_threads(pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this hack, I need to test it a little though. One question I have:
- signal A (to main thread tid == pid == 1) arrives and uses tgkill to send a new signal to thread 2-3... (assuming there are 3 threads (1-3))
- signal B to thread 2... OK
- signal C to thread 1 (issuing more signals to thread 2 and 3)
- signal D to thread 3 ...
Will thread 3 get another signal initiated by signal C? Is it swallowed?
If the signal dispatching takes to long I think that should be recoginized...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure these signals are allowed to be coalesced / swallowed, yes.
I don't think there's a good way to detect this, but I might be wrong?
Possible workarounds:
- Ignore it, when we read the profile later we can check for lost signals / stack traces if we know the threads were present for the whole profile duration. If we don't know that, it gets tricky.
- Trigger a real-time signal from the main thread instead, and set other threads to catch that real-time signal instead of the SIGALRM from the timer?
- Other?
Hmm. Instead of waiting for signals and re-broadcasting them from the signal handler, can we instead spawn a designated signalling thread, whose only function is to act as a timer and deliver the signals to other threads in the process? Then we could use thread-unsafe functions, but we'd need to handle forks correctly, spawning another signalling thread per-process, I guess. |
Here's a simplified verion that's maybe more suitable to start with - we pin ourselves to the thread that requested the profile, and make sure all profiling is done on that thread, ignoring others. A better option for thread handling could be the next version of the feature, maybe? Ideally, in the fullness of time, we'd go from supporting one real-time thread, to multiple threads, to multiple threads and/or processes, but it's useful to start small. I also added some simple tests to show that it's working, and how. |
(Note: even with obvious small fixes, this still fails the tests on RPython, I'll probably need a hand there.) |
Aaaand here's a version where we perform no signal-unsafe functions in the handler. We can register and unregister threads from sampling, and in real time mode it's up to the caller. |
Here's a small patch for PyPy, now all the tests pass over there, too. |
Any progress on this? I'll be free to do some open-source stuff within the next week, so if I can make some changes to push it forward, I'll do just that. In particular, if we're fine with spawning a designated signalling thread, I already know how to support both Linux and MacOS, without relying on signal delivery details on Linux. ;) |
Looks good to me. Though the osx support is missing right? I'll merge it if we both support linux & osx. |
Seems to work on OSX now. 🎉 In other words - I think it's merge-ready? Probably want to squash it, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have two minor questions/comments otherwise I'm happy to merge it
src/vmprof_main.h
Outdated
pthread_t tid; | ||
while (i < thread_count) { | ||
tid = threads[i]; | ||
if (pthread_equal(tid, self)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add curly brackets for if & else here? it is really hard to read and potential risk for errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
// Consequently if we want to sample multiple threads, we need to forward this signal. | ||
if (signal_type == SIGALRM) { | ||
if (is_main_thread() && broadcast_signal_for_threads()) { | ||
__sync_lock_release(&spinlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure why the spinlock is released here. It occurs to me as if the spinlock is only used to protect the threads array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no - it's used to protect whatever it was normally protecting and the threads array.
If we don't need to sample the main thread (when it's not registered for real-time sampling in the first place), we can bail out early here, because we just need to broadcast the signal - but we still need to release the spinlock before the immediate return.
Rebased against upstream master and addressed the comments. Cheers! ;) |
Thanks again! |
Cool! Could you maybe make a .dev version with this available on pip? I'd like to test some stuff about it later, and it's much simpler to set up if I can just pip install. ;) |
Yes I did, there is 0.4.6.dev0 |
I'm currently rewriting some C code in Python. However I noticed that real time profiling is now some times not working. see: https://travis-ci.org/vmprof/vmprof-python/jobs/239723456 Could you maybe have a look at that? What is the reason for that? |
Interesting... I can have a proper look over the weekend, will ping you earlier if it's something obvious. Did you only notice it failing in tests, or did something weird happen with some real-world case as well? |
Thanks. Currently I disabled one test with xfail...
…On Tue, Jun 6, 2017, 11:18 AM Bartosz Wróblewski ***@***.***> wrote:
Interesting... I can have a proper look over the weekend, will ping you
earlier if it's something obvious.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#134 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAcL2gcfsbOTYiSZ-3v-422QaxjJem2lks5sBW3egaJpZM4M-WTQ>
.
|
TL;DR:
time.sleep()
or waiting on IO can be detected now.Rationale: vmprof is an excellent low-overhead profiler, but sometimes what we really want is to measure real time usage of the process, to detect situations when we're waiting for something else to happen.
That's especially handy when it comes to optimizing your code at a high-level, when the bottleneck is not in the Python code itself, but in the way it interacts with the rest of the system. Similarly, for that use case we want to sample all the threads, not just the thread that's currently running.
(Exact analysis of thread behaviour can be done downstream, because vmprof is already saving the thread ID for every stack snapshot.)
I'd love to support OSX, but I don't have a machine to test its signal behaviour.
Cheers!