New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PauseChildThreads() sometimes hangs #55

Closed
eklitzke opened this Issue Mar 4, 2017 · 17 comments

Comments

Projects
None yet
3 participants
@eklitzke
Copy link
Collaborator

eklitzke commented Mar 4, 2017

I think this is related to #53 (cc/ @jamespic) . I am consistently seeing Pyflame hang when profiling threads on my Chromebook. This is running Debian Jessie in a chroot, and the host kernel is 3.14.0. IIRC I did test this branch on a regular Jessie host, so I think it might be related to the ChromeOS kernel? I'll have to check.

What I am seeing is that it goes to pause both threads in a Python process running test_threaded_sleeper.py, and gets stuck on the second thread. By "stuck" I mean the call to wait() that is made after the PTRACE_ATTACH never completes. The full stack is like:

(gdb) bt
#0  0x00007f0358495ab2 in __libc_wait (stat_loc=0x0) at ../sysdeps/unix/sysv/linux/wait.c:30
#1  0x00000000004065eb in pyflame::PtraceAttach (pid=pid@entry=26554) at ptrace.cc:42
#2  0x00000000004079a1 in PauseChildThreads (pid=26553) at ptrace.cc:209
#3  pyflame::PtraceCallFunction (pid=26553, addr=4794543) at ptrace.cc:222
#4  0x000000000040a5f1 in pyflame::PyFrob::DetectPython (this=this@entry=0x7fffd588a130) at pyfrob.cc:135
#5  0x0000000000403845 in main (argc=<optimized out>, argv=<optimized out>) at pyflame.cc:268

This is a blocker to tag a new release.

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Mar 4, 2017

Hmm. I don't have a Chromebook, so it'll be difficult to reproduce. I think the key question is what state the second thread is in. Is it already stopped, or otherwise not allowed to run (in which case the fix is to not bother attaching), or is it being stopped by the attach call, but for some reason pyflame isn't notified (a race maybe? maybe there's some way we can check if it's already stopped before we wait), or is it just falling to attach?

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 4, 2017

Seems to work on regular Jessie (with kernel 3.16.0-4-amd64). I'll poke around on my Chromebook some more and see if I can figure out what's going on.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 4, 2017

I created a branch called "wait" that prints some debugging info.

On my chromebook exhibiting the bug I see: https://gist.github.com/eklitzke/c7bdc74b162b5a9eb3c3ff70a141d84a

On a regular jessie (without the bug) I see: https://gist.github.com/eklitzke/e84dc1993fc0a81bb8311268357d335f

On regular jessie what happens is each thread/process starts out in the sleeping state (S), then moves to the running state (R) after the PTRACE_ATTACH, and then moves to the traced state (t) after the call to wait().

On my chromebook what happens is the main process goes S -> t -> t, and then the next thread goes S -> t -> wait fails here.

I'm guessing this is some kind of kernel change, either between 3.14 and 3.16, or perhaps some patches Google applies on the Chromebook kernels. If I can't reproduce with a stock 3.14 kernel then we can ignore this I guess.

@lrewega

This comment has been minimized.

Copy link

lrewega commented Mar 5, 2017

Can you check this setting on your Chromebook? cat /proc/sys/kernel/yama/ptrace_scope

See also https://www.kernel.org/doc/Documentation/security/Yama.txt

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Mar 5, 2017

It would certainly make sense for this to be ChromeOS specific. I know they're big on sandboxing, so they enable a lot of the kernel's security controls, as well as adding a few of their own. I'll see if I can get a Chromium OS VM running, and try to get to the bottom of this.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 6, 2017

I spent some time building kernels today, and it looks like something changed between kernel 3.15.10 and kernel 3.16.0. Here's how to reproduce:

Here's specifically how I built a kernel, after you've installed the build dependencies in the previous link:

tar -xvJf linux-3.15.10.tar.xz
cd linux-3.15.10
cp /boot/config-3.16.0-4-amd64 .config  # or whatever config you have in /boot
make oldconfig   # hit ENTER for all questions, there should only be a few of them
make-kpkg clean
fakeroot make-kpkg --initrd --revision=1.0.custom kernel_image
sudo dpkg -i ../*.deb

I don't think this is high priority because there aren't any widely used distros with kernels this old (maybe RHEL/Centos?). But there's probably a simple fix here related to the semantics of ptrace/wait changing. I looked a bit at the kernel changelogs and saw a number of ptrace changes, but none that looked relevant.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 6, 2017

Interestingly the bug happens with a stock 3.16.0 kernel. The kernel that comes with Jessie (which does not have this bug) purports to be 3.16.0-4, which means it's 3.16.0 + a bunch of patches that Debian has likely backported.

Here's a stock kernel.org 3.16.0 kernel that I built for amd64 if you want to try to repro the issue: http://eklitzke.org/linux-image-3.16.0_1.0.custom_amd64.deb

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Mar 6, 2017

If it affects 3.16.0, but not 3.16.0-4, that should help us track it down. Bisecting those Debian fixes should pin it down to a small number of changes. I'll try and find some time take a look later today.

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Mar 6, 2017

The Debian changelogs show that it includes https://patchwork.kernel.org/patch/8593281/, which seems like it could be the culprit. It looks like a workaround for lazy developers (in the original bug, the developers of init, or in this case me) failing to use the __WALL flag when waiting on child threads. It should just be a question of adding the __WALL flag to the waitpid call options. I'll try it out when I get home.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 6, 2017

I'm not so sure, because if that was the bug I'd expect the waitid branch (which I just pushed) to not have this bug. The relevant code there:

static void do_wait(pid_t pid) {
  std::ostringstream ss;
  siginfo_t infop;
  if (waitid(P_ALL, pid, &infop, __WALL) == -1) {
    ss << "Failed to waitid() on pid " << pid << ": " << strerror(errno);
    throw PtraceException(ss.str());
  }
  if (infop.si_status != CLD_STOPPED) {
    ss << "Expected waitid() for pid " << pid << " to yield CLD_STOPPED";
    throw PtraceException(ss.str());
  }
  if (infop.si_pid != pid) {
    ss << "Expected waitid() to return pid " << pid << ", instead got " << infop.si_pid;
    throw PtraceException(ss.str());
  }
}

I'm trying to bisect the kernel, and it's something relatively recent (e.g. I saw a 4.5 kernel with the issue). It could also be that I misunderstand the waitid bug.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 6, 2017

So, it is definitely the patch you linked to earlier. I built the kernel at that git revision, and the one before, and once that commit is in the kernel the problem no longer surfaces.

What I don't understand is that according to the bug report, the code should work on old kernels when using waitid() with __WALL. So I'm not sure what error I made in my waitid branch.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 6, 2017

Some interesting commentary on this change here: https://lwn.net/Articles/688624/

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Mar 6, 2017

The article mentions a subtlety of __WALL that I missed reading the docs: it can't be used with waitid (the docs are less clear what it can be used for, although waitpid seems the most likely guess).

So it seems like the docs are suggesting that a similar call with waitpid may work on the older kernels, although I'm becoming increasingly confused by the semantics of these calls. I fear I may have to actually look at the implementation in the kernel source.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 7, 2017

The relevant code in the kernel is in the patch you already linked to. From what I can see, the old code does check for __WALL (see the red lines from the diff). However, the LWN post says the opposite:

and, besides, the waitid() system call, unlike wait4(), does not even accept the __WALL flag

GDB has been able to debug multi-threaded programs forever, so there must be a way to do this by doing whatever GDB has done historically. I'll experiment with using wait4() instead of waitpid() or waitid().

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 7, 2017

It looks like what people have done in the past (before the ptrace patch was merged) was to literally send SIGSTOP to the main process, which stops all threads. Then you can go in and PTRACE_SEIZE the already-stopped threads without waiting for them. This is really annoying, especially given that the much saner code you already wrote works.

I already added a --no-threads option in another recent commit. I'm a little bit worried about enabling a feature that will break kernels that are a year old, since a lot of people run older kernels in production, and don't necessarily apply security updates regularly. I think maybe the right thing to do here is to invert the option (so that by default multiple threads aren't traced), and then add an option --threads or the like for people who want multi-threaded profiling, with a note that it may not work on older kernels.

What do you think?

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Mar 7, 2017

Yeah, that sounds reasonable. Multithreaded Python code is not all that common, and unpatched systems are terrifyingly common. Plus, inverting the options means that the new version doesn't break anyone's expectations (there'll be someone relying on (idle) who'll raise a bug if it's not the default).

And I agree that explicitly supporting unpatched kernels is not worthwhile. The old approach is messy and racy, and it seems like most distros are backporting security fixes like this one (although it's interesting that Chrome OS doesn't, given their usually laser focus on security - maybe their init is already immune to the issue this fixes).

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Mar 7, 2017

Oh, and one other thing I gleaned from the docs: if you specify P_ALL, the process id is ignored. Just mentioning since I spotted that we're using an explicit process id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment