Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional real time profiling support. #134

Merged
merged 5 commits into from
May 1, 2017
Merged

Conversation

bawr
Copy link
Contributor

@bawr bawr commented Apr 15, 2017

TL;DR: time.sleep() or waiting on IO can be detected now.

Rationale: vmprof is an excellent low-overhead profiler, but sometimes what we really want is to measure real time usage of the process, to detect situations when we're waiting for something else to happen.

That's especially handy when it comes to optimizing your code at a high-level, when the bottleneck is not in the Python code itself, but in the way it interacts with the rest of the system. Similarly, for that use case we want to sample all the threads, not just the thread that's currently running.

(Exact analysis of thread behaviour can be done downstream, because vmprof is already saving the thread ID for every stack snapshot.)

I'd love to support OSX, but I don't have a machine to test its signal behaviour.

Cheers!

@bawr bawr force-pushed the real-time branch 3 times, most recently from 6b0d5cb to c4bc027 Compare April 15, 2017 14:04
Copy link
Contributor

@planrich planrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your contribution! I like the changes. I will help with Mac OS X and see if I can find some API to the same on Mac.

src/vmprof.h Outdated
@@ -26,6 +26,7 @@
#define PROFILE_LINES '\x02'
#define PROFILE_NATIVE '\x04'
#define PROFILE_RPYTHON '\x08'
#define PROFILE_REAL_TIME '\x0f'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"\x0f" is binary "0000 1111", I think it should be "\x10" for binary "0001 0000". If I remember correctly those macros specify bits for flags...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh! Yes, this is wrong, and should be '\x10'.

@@ -175,11 +179,43 @@ static PY_THREAD_STATE_T * _get_pystate_for_this_thread(void) {
}
#endif

#ifdef VMPROF_LINUX
static void broadcast_signal_for_threads(pid_t pid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see opendir in the list of signal safe functions. Neither is readdir_r and atoi. I might be wrong, but we need to double check if they are signal safe! Maybe there is another API that enumerates the thread ids of the process without asking the disk!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good call. In turn:

  1. What can we do if they're not signal safe?
  2. I've actually tried to find another API for thread enumeration and failed, I don't like the readdir solution, either. (Although technically it's asking the kernel, not the disk.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, true the kernel delivers that info. If they calls are not signal safe, the program most likely freezes or crashes.
We could think about hooking into thread creation, that would register/unregister the thread id. Though a solution asking the kernel would be much better.

// For the real timer, the signal gets delivered to the main thread, seemingly always.
// Consequently if we want to sample all threads, we need to forward the signal with tgkill.
if ((signal_type == SIGALRM) && (pid == tid)) {
broadcast_signal_for_threads(pid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this hack, I need to test it a little though. One question I have:

  1. signal A (to main thread tid == pid == 1) arrives and uses tgkill to send a new signal to thread 2-3... (assuming there are 3 threads (1-3))
  2. signal B to thread 2... OK
  3. signal C to thread 1 (issuing more signals to thread 2 and 3)
  4. signal D to thread 3 ...

Will thread 3 get another signal initiated by signal C? Is it swallowed?
If the signal dispatching takes to long I think that should be recoginized...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure these signals are allowed to be coalesced / swallowed, yes.
I don't think there's a good way to detect this, but I might be wrong?

Possible workarounds:

  1. Ignore it, when we read the profile later we can check for lost signals / stack traces if we know the threads were present for the whole profile duration. If we don't know that, it gets tricky.
  2. Trigger a real-time signal from the main thread instead, and set other threads to catch that real-time signal instead of the SIGALRM from the timer?
  3. Other?

@bawr
Copy link
Contributor Author

bawr commented Apr 15, 2017

Hmm. Instead of waiting for signals and re-broadcasting them from the signal handler, can we instead spawn a designated signalling thread, whose only function is to act as a timer and deliver the signals to other threads in the process?

Then we could use thread-unsafe functions, but we'd need to handle forks correctly, spawning another signalling thread per-process, I guess.

@bawr
Copy link
Contributor Author

bawr commented Apr 15, 2017

Here's a simplified verion that's maybe more suitable to start with - we pin ourselves to the thread that requested the profile, and make sure all profiling is done on that thread, ignoring others. A better option for thread handling could be the next version of the feature, maybe?

Ideally, in the fullness of time, we'd go from supporting one real-time thread, to multiple threads, to multiple threads and/or processes, but it's useful to start small. I also added some simple tests to show that it's working, and how.

@bawr
Copy link
Contributor Author

bawr commented Apr 15, 2017

(Note: even with obvious small fixes, this still fails the tests on RPython, I'll probably need a hand there.)

@bawr
Copy link
Contributor Author

bawr commented Apr 16, 2017

Aaaand here's a version where we perform no signal-unsafe functions in the handler.

We can register and unregister threads from sampling, and in real time mode it's up to the caller.

@bawr
Copy link
Contributor Author

bawr commented Apr 16, 2017

Here's a small patch for PyPy, now all the tests pass over there, too.
https://gist.github.com/bawr/d80d99278b3619a6646c80646fa0dcf5

@bawr
Copy link
Contributor Author

bawr commented Apr 21, 2017

Any progress on this?

I'll be free to do some open-source stuff within the next week, so if I can make some changes to push it forward, I'll do just that. In particular, if we're fine with spawning a designated signalling thread, I already know how to support both Linux and MacOS, without relying on signal delivery details on Linux. ;)

@planrich
Copy link
Contributor

Looks good to me. Though the osx support is missing right? I'll merge it if we both support linux & osx.

@bawr
Copy link
Contributor Author

bawr commented Apr 28, 2017

Seems to work on OSX now. 🎉

In other words - I think it's merge-ready? Probably want to squash it, though.

Copy link
Contributor

@planrich planrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have two minor questions/comments otherwise I'm happy to merge it

pthread_t tid;
while (i < thread_count) {
tid = threads[i];
if (pthread_equal(tid, self))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add curly brackets for if & else here? it is really hard to read and potential risk for errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

// Consequently if we want to sample multiple threads, we need to forward this signal.
if (signal_type == SIGALRM) {
if (is_main_thread() && broadcast_signal_for_threads()) {
__sync_lock_release(&spinlock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure why the spinlock is released here. It occurs to me as if the spinlock is only used to protect the threads array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no - it's used to protect whatever it was normally protecting and the threads array.

If we don't need to sample the main thread (when it's not registered for real-time sampling in the first place), we can bail out early here, because we just need to broadcast the signal - but we still need to release the spinlock before the immediate return.

@bawr
Copy link
Contributor Author

bawr commented Apr 29, 2017

Rebased against upstream master and addressed the comments. Cheers! ;)

@planrich planrich merged commit ccae9f7 into vmprof:master May 1, 2017
@planrich
Copy link
Contributor

planrich commented May 1, 2017

Thanks again!

@bawr
Copy link
Contributor Author

bawr commented May 2, 2017

Cool! Could you maybe make a .dev version with this available on pip?

I'd like to test some stuff about it later, and it's much simpler to set up if I can just pip install. ;)

@planrich
Copy link
Contributor

planrich commented May 2, 2017

Yes I did, there is 0.4.6.dev0

@planrich
Copy link
Contributor

planrich commented Jun 5, 2017

I'm currently rewriting some C code in Python. However I noticed that real time profiling is now some times not working. see: https://travis-ci.org/vmprof/vmprof-python/jobs/239723456

Could you maybe have a look at that? What is the reason for that?

@bawr
Copy link
Contributor Author

bawr commented Jun 6, 2017

Interesting... I can have a proper look over the weekend, will ping you earlier if it's something obvious.

Did you only notice it failing in tests, or did something weird happen with some real-world case as well?

@planrich
Copy link
Contributor

planrich commented Jun 6, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants