New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code to walk the list of thread states #13

Closed
eklitzke opened this Issue Aug 25, 2016 · 13 comments

Comments

Projects
None yet
4 participants
@eklitzke
Copy link
Collaborator

eklitzke commented Aug 25, 2016

This is how you do it:

  • each interpreter has a field called tstate_head which is the head of a linked list of thread states
  • each thread state has a field called next which is a pointer to the next thread state
  • the last thread state has a null pointer for next

Likewise:

  • each thread state has a back reference to the interpreter state in a field called interp
  • there's a single linked list of interpreters whose head is the static symbol interp_head

In the single-threaded case there is one interpreter and one thread state.

In the generic case the way you enumerate the thread states is:

  • locate _PyThreadState_Current
  • follow interp up to the interpreter
  • follow tstate_head to the first thread state (which will always be the same as _PyThreadState_Current for a single-threaded program, but could be different for a multi-threaded program)
  • follow the next field until NULL is encountered

It should be fine to ignore the multiple interpreter thing, no one really uses that feature nowadays.

@ogrisel

This comment has been minimized.

Copy link

ogrisel commented Oct 18, 2016

Related: it would be great to have a way to generate one flamegraph svg file per active thread in a multithreaded python program.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Oct 31, 2016

I gave a few more details about this in #41 .

This is probably the next major feature I'll add to Pyflame (after fixing #34, which has a PR by me out). However, I'm going to spend most of November traveling. If someone wants to surprise me with a PR to fix this in the interim that would be great, otherwise I'll probably get some time in December.

@eklitzke eklitzke added this to the Pyflame2 milestone Oct 31, 2016

@batterseapower

This comment has been minimized.

Copy link
Contributor

batterseapower commented Nov 15, 2016

Did some initial work on this at batterseapower@36ba9e1

Right now it doesn't seem to be able to find interp_head, which is necessary if we want to do anything interesting (PyThreadState_Current is NULL if the GIL is not held). This should be fixable because e.g. nm -a knows about interp_head.

@batterseapower

This comment has been minimized.

Copy link
Contributor

batterseapower commented Nov 15, 2016

Fixed that, but PyThreadState.frame seems to always be null in the cases I looked at.

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Feb 13, 2017

@batterseapower That makes sense. PyThreadState_Current is only non-null when something holds the GIL. Threads set it when they acquire the GIL, based on a value held in a thread-local. If I get some time, I'll look at the feasibility of traversing thread-locals.

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Feb 19, 2017

I've got some further work on jamespic@db3feaa.

@batterseapower I think you were actually on the right lines. What you were doing should have worked. I thought I was getting null frames, until I realised that I was running against a debug build (since normal builds don't expose interp_head), which had a different PyObject layout. There was also a bug in your "is current" code that was throwing away perfectly good frames. With a couple of tweaks I was able to get your code profiling non-GIL frames.

Meanwhile, I also managed to get it profiling non-GIL frames on non-debug builds (builds without interp_head exposed), at least on AMD64. Whilst the variable itself isn't exposed, the PyInterpreterState_Head function is. Using some evil hacks stolen from https://github.com/eklitzke/ptrace-call-userspace (thanks @eklitzke) I was able to get it calling this function via ptrace.

My apologies for the scrappiness of the code - it's literally the first thing I've ever written in C++.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Feb 28, 2017

I have a version that copies the code into a new page: 2ac20de . This still crashes the Python process, but it seems like it might crash less frequently? This version also leaks a page per pyflame invocation (it's just a POC). Hopefully I can take a look again tomorrow, if not you can try this approach if you like.

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Feb 28, 2017

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Feb 28, 2017

I've managed to reproduce locally. It's still not thread safe. The initial memory overwrite for mmap is the issue now - there's nothing to stop another thread hitting the dirty code, so if code gets overwritten in a hot library (when I reproduced the issue, it was in libc, in do_futex_wait), there's a high probability of something else hitting it.

I'll look at stopping threads whilst code is in an unsafe state.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Feb 28, 2017

There are a couple of answers here on how to pause all threads: http://stackoverflow.com/questions/18577956/how-to-use-ptrace-to-get-a-consistent-view-of-multiple-threads

Not leaking a page is pretty easy -- you just need to use the munmap(2) syscall. Actually there are some tricks to try to keep the page allocated and then find it later, but for an initial implementation mmap(2) followed by munmap(2) later on is probably fine.

@jamespic

This comment has been minimized.

Copy link
Contributor

jamespic commented Mar 1, 2017

I added thread pausing in jamespic@e038c99, which seems to have solved the crashing issue for me.

@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 1, 2017

This is looking good to me, and seems stable on my system. You might want to rung clang-format again, there was one style violation I saw in that commit which is that the style for range-based for loops should be for (auto foo : blah) not for (auto foo: blah). (Assuming clang-format knows about that.)

If you submit another PR I'll do another look over the whole diff.

Before merging:

  • Consider adding a note to the README about threading support. (If not, I can add it.)
  • Ideally the commits should be squashed to be somewhat logical. Since I first saw the PR, I already cherry-picked 189ce18 onto master. Preferably all of @batterseapower's commits would be squashed to one (or a few) logical changes, and same with yours. Basically, the branch should at least be bisectable.

eklitzke added a commit that referenced this issue Mar 3, 2017

Merge pull request #53 from jamespic/rebase
Walk threads and handle non-GIL code - Implements #13
@eklitzke

This comment has been minimized.

Copy link
Collaborator

eklitzke commented Mar 3, 2017

Thanks @jamespic for fixing this!

@eklitzke eklitzke closed this Mar 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment