Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows support #9

Open
cretz opened this issue Jan 14, 2020 · 21 comments
Open

Windows support #9

cretz opened this issue Jan 14, 2020 · 21 comments

Comments

@cretz
Copy link

cretz commented Jan 14, 2020

Just tossing this issue out here for others that may reach this lib. Build fails since pthread_self and pthread_getname_np are, predictably, not present. Can std::thread::current and std::thread::Thread.name be used instead? Would it even matter with how signals work here?

@YangKeao
Copy link
Member

We cannot use std::thread::current here. Because a thread_local RefCell is used in std::thread::current to get current THREAD_INFO, it's not safe to use in signal handler.

For compiling and using this lib in windows, I will remove the usage of thread name (when it is used in windows).

I know little about windows programming. Do you have any suggestion on how to identify threads on windows? Do it have thread name or something similar?

@YangKeao
Copy link
Member

I will remove the usage of thread name (when it is used in windows).

This has been done in #15 . Can you help me test on windows?

@cretz
Copy link
Author

cretz commented Jan 31, 2020

Awesome. I may not be able to test for a while (moved off of Rust project for a bit), but will try to soon...

@YangKeao
Copy link
Member

YangKeao commented Feb 3, 2020

@cretz Sorry, it still cannot work on windows, because we use setitimer syscall to send signal, interrupt and sample. (And the signal handler is also set by nix crate which definitely cannot work on windows)

@LK4D4
Copy link

LK4D4 commented Aug 15, 2020

Hello, I can help with testing (and maybe development) on Windows and interested in decent rust profiler :)
Right now from what I see one major problem is nix crate and second problem is lack of pthread stuff in libc crate:

error[E0412]: cannot find type `pthread_t` in crate `libc`
  --> C:\Users\lk4d4\.cargo\git\checkouts\pprof-rs-e0fcf6c5ee7a42e1\de4c986\src\profiler.rs:79:53
   |
79 | fn write_thread_name_fallback(current_thread: libc::pthread_t, name: &mut [libc::c_char]) {
   |                                                     ^^^^^^^^^ not found in `libc`

error[E0412]: cannot find type `pthread_t` in crate `libc`
   --> C:\Users\lk4d4\.cargo\git\checkouts\pprof-rs-e0fcf6c5ee7a42e1\de4c986\src\profiler.rs:105:44
    |
105 | fn write_thread_name(current_thread: libc::pthread_t, name: &mut [libc::c_char]) {
    |                                            ^^^^^^^^^ not found in `libc`

error[E0425]: cannot find function `pthread_self` in crate `libc`
   --> C:\Users\lk4d4\.cargo\git\checkouts\pprof-rs-e0fcf6c5ee7a42e1\de4c986\src\profiler.rs:139:49
    |
139 |             let current_thread = unsafe { libc::pthread_self() };
    |                                                 ^^^^^^^^^^^^ not found in `libc`

Let me know if I can help.
Thanks!

@LK4D4
Copy link

LK4D4 commented Aug 15, 2020

So, I did a little research and looks like it isn't that easy, but doable. Gonna leave it here so I won't forget or someone has more time than me.
Windows doesn't support SIGPROF and I didn't find any analog. There is go implementation of a similar thing https://github.com/golang/go/blob/master/src/runtime/os_windows.go#L1016-L1112
It just starts a separate thread that wakes up from time to time and iterates all other threads in the process one by one, suspends them, gets context, process, resumes.
In theory, we could implement the same thing with https://docs.rs/winapi/0.3.9/winapi/um/synchapi/fn.SetWaitableTimer.html and https://docs.rs/winproc/0.6.4/winproc/struct.Process.html#method.thread_ids.
Not sure about the details of getting backtraces, it probably won't be supported by backtrace-rs, but rather will have a custom implementation.

@YangKeao
Copy link
Member

@LK4D4 Cool! Thanks for your research 🍻 .

It's a good start for someone (like me) to consider about the implementation on windows 😺 .

@sunng87
Copy link

sunng87 commented Oct 13, 2020

Before we can get it work on Windows, is it possible to at least compile on Windows? I received report that my tests doesn't compile on Windows because I use pprof in my benchmark code.

@LK4D4
Copy link

LK4D4 commented Oct 13, 2020

FWIW I think compilation on windows should be easy, I actually got it to compile, but not working.
I'll try to find my code and send PR.

@YangKeao
Copy link
Member

@sunng87 I don't think "compile but not work" is a good choice. You could use conditional compilation and platform specific dependencies to avoid pprof-rs on windows target.

@sunng87
Copy link

sunng87 commented Oct 14, 2020

@YangKeao Thank you! That's better solution.

@Jardynq
Copy link

Jardynq commented Apr 24, 2022

Bump.
I have tried some of implementing @LK4D4 ideas, and it seems to work?
I haven't tried using the SetWaitableTimer, but instead i create a worker thread that enumerates all thread ids and uses a modified backtrace-rs, that allows backtracing of threads other than the calling thread, by passing a handle.

Though I don't know if the backtrace is correct.

The following code generates the following backtrace on windows 11 build 22489.1000
Built with nigthly-x86_64-pc-windows-msvc v. 1.61.0

fn main() {
    let guard = pprof::ProfilerGuardBuilder::default().frequency(1000).build().unwrap();

    for i in 0..1000000 {
        if i % 100 == 0 {
            println!("abebebb");
        }
    }

    if let Ok(report) = guard.report().build() {
        let file = std::fs::File::create("flamegraph.svg").unwrap();
        report.flamegraph(file).unwrap();
    };
}

flamegraph
flamegraph.svg.txt

I haven't bothered with naming the threads, so the name is just their id. 43dc is the main thread. The worker thread filters itself away, so it's not shown.
Can anyone verify that this stacktrace makes sense?

@aminya
Copy link

aminya commented Apr 25, 2022

@Jardynq Could you send a PR for us to try?

@Jardynq
Copy link

Jardynq commented May 1, 2022

Sorry for the late reply. Got a throat infection that knocked me out.

After tidying up the code for a PR, i encountered some bugs that made me experiment with a few other methods:

SetWaitableTimer
From my understanding this is effectively a wrapper around QueueUserAPC.

QueueUserAPC
This works by queuing a callback to be called from a specific thread by the scheduler when the thread is in an alertable state. We can have a callback that simply reads the backtrace, similar to how it's currently done on linux. The issue with this, is that the alertable state is only achieved in specific circumstances. Such as waiting for synchronous file IO. This is not good enough.

QueueUserAPC2
Luckily a second method exists that will try to execute as soon as possible. A small pitfall with this, is that it does not work on WoW64. I tried making an implementation, but the QueueUserAPC2 would add a couple of functions to the callstack. Of course, these could be filtered out, but sometimes the function would completely corrupt the callstack. This happened about 33% of the time (counted by eye). It has another issue in not being synchronized. This is the same issue that suspend thread has.

Manually: SuspendThread, GetThreadContext, ResumeThread
This was the original implementation as described in my first comment. The reason i went back and forth on this was for a couple of reasons. Firstly the issue with suspend thread. If a thread has access to a synchronization object (which windows does a lot under the hood) while being suspended. Then if the main thread tries to access that object, it will end in a deadlock. This happened rarely (maybe 1%), but often enough that it was an issue. Luckily this seems to be fixed by simply ensuring that the main thread doesn't do anything that would require synchronization. Currently the forked implementation of backtrace calls the three functions named above, directly after eachother. This means that the child thread is only suspended while main is calling GetThreadContext and ResumeThread. These should hopefully not end in a deadlock.
Another issue which can actually be seen in my comment above, is that a thread's backtrace does not always start at the same place. In other words, all backtraces should start with RtlUserThreadStart since that is the entry point. But sometimes the callback does not contain this. This happens a minority of the time and therefore the widest callstack is the correct one while the thin ones are "corrupted"?. But this is still a slight annoyance and could confuse someone trying to decipher the data.
A possible solution would be, to filter away all the backtraces that don't end in RtlUserThreadStart since something must've gone wrong with them. At the same time notify the user how many samples from a given thread where bad, so they can use it in their evaluation.

From my testing it seems that the manual method is the best and most consistent.

forked backtrace-rs
PR-issue

@aminya
Copy link

aminya commented Jul 10, 2022

I opened a pull request for backtrace-rs. This should make it possible to rework #122.
rust-lang/backtrace-rs#473

@maksymsur
Copy link

maksymsur commented Jul 19, 2022

support this initiative as well. having win support would be great! at the bare minimum to be able to compile at least... which is super useful for a cross-platform development

@aminya
Copy link

aminya commented Jul 19, 2022

See rust-lang/backtrace-rs#473
The Rust developers have asked me to copy-paste code to support profiling on Windows. I think that's against software reuse and maintainability fundamentals. Maybe you can review that pull request and add your opinion.

@Jardynq
Copy link

Jardynq commented Jul 23, 2022

@aminya I think the correct solution is to fork backtrace-rs. Luckily, it isn't updated often and the changes we need are so minor, that should be able to keep the fork up to date with upstream without any merge conflicts.
I have updated my fork to fix an issue pointed out by alexcrichton.

@YangKeao
Copy link
Member

I think the correct solution is to fork backtrace-rs

I think so. The backtrace-rs has a lot of helpful things which are not exported. Actually I can understand the developers of backtrace-rs don't want to export / abstract these logic, as it will increase the complexity and make it harder to keep compatibility. We can reuse these logic through forking backtrace-rs.

You'll also need to give the forked backtrace-rs a new name, and publish it to the crates.io, as it's not allowed to use a git dependency for a published crate (pprof-rs)

@aminya
Copy link

aminya commented Jul 25, 2022

Yeah, I think forking is a better solution than copy-pasting code. I can call the fork something like backtrace-thread.

@Jardynq I will merge the changes into my branch and publish it. I made some changes to your original fork as it was not ready for a PR and had some issues.

@maksymsur
Copy link

also support a forking solution 👍

NickDarvey added a commit to NickDarvey/lance that referenced this issue Jun 25, 2024
pprof doesn't work on Windows. It's disabled in most of the Lance libraries, except for two: lance-index and lance-io. This PR disables it for those two in the same manner.

See also tikv/pprof-rs#9
wjones127 pushed a commit to lancedb/lance that referenced this issue Jun 29, 2024
pprof doesn't work on Windows. It's disabled in most of the Lance
libraries, except for two: lance-index and lance-io. This PR disables it
for those two in the same manner.

See also tikv/pprof-rs#9

Co-authored-by: Lei Xu <lei@lancedb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants