Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command .wait hanging on MacOS #6770

Closed
justin-elementlabs opened this issue Aug 13, 2024 · 10 comments
Closed

Command .wait hanging on MacOS #6770

justin-elementlabs opened this issue Aug 13, 2024 · 10 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-process Module: tokio/process

Comments

@justin-elementlabs
Copy link

Version
│ │ │ └── tokio v1.39.2
│ │ │ └── tokio-macros v2.4.0 (proc-macro)
│ │ │ │ ├── tokio v1.39.2 ()
│ │ │ │ └── tokio-util v0.7.11
│ │ │ │ └── tokio v1.39.2 (
)
│ │ │ ├── tokio v1.39.2 ()
│ │ │ ├── tokio v1.39.2 (
)
│ │ │ ├── tokio-util v0.7.11 ()
│ │ │ ├── tokio v1.39.2 (
)
│ │ │ ├── tokio v1.39.2 ()
│ │ │ └── tokio-rustls v0.24.1
│ │ │ └── tokio v1.39.2 (
)
│ │ ├── tokio v1.39.2 ()
│ │ │ ├── tokio v1.39.2 (
)
│ │ │ └── tokio-native-tls v0.3.1
│ │ │ └── tokio v1.39.2 ()
│ │ ├── tokio v1.39.2 (
)
│ │ ├── tokio-native-tls v0.3.1 ()
│ │ ├── tokio-socks v0.5.1
│ │ │ └── tokio v1.39.2 (
)
│ │ │ └── tokio v1.39.2 ()
│ │ ├── tokio v1.39.2 (
)
│ │ ├── tokio-io-utility v0.7.6
│ │ │ └── tokio v1.39.2 ()
│ ├── tokio v1.39.2 (
)
│ └── tokio-pipe v0.2.12
│ └── tokio v1.39.2 ()
│ │ │ └── tokio v1.39.2 (
)
│ │ ├── tokio v1.39.2 ()
│ │ └── tokio-io-utility v0.7.6 (
)
│ ├── tokio v1.39.2 ()
│ ├── tokio-io-utility v0.7.6 (
)
│ └── tokio-util v0.7.11 ()
├── tokio v1.39.2 (
)
└── tokio-process v0.2.5
├── tokio-io v0.1.13
├── tokio-reactor v0.1.12
│ ├── tokio-executor v0.1.10
│ ├── tokio-io v0.1.13 ()
│ └── tokio-sync v0.1.8
└── tokio-signal v0.2.9
├── tokio-executor v0.1.10 (
)
├── tokio-io v0.1.13 ()
└── tokio-reactor v0.1.12 (
)

Platform
Darwin Macusers-MBP.lan 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:14:38 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020 arm64

Description

It seems Tokio is not able to detect when a command returns on MacOS.

I tried this code:

use tokio::process::Command;

println!("START");

let mut child = Command::new("echo")
    .arg("hello")
    .arg("world")
    .spawn()
    .expect("failed to spawn");

println!("MIDDLE");

// Await until the command completes
let status = child.wait().await;
println!("the command exited with: {:?}", status);

println!("END");

I expected to see this happen:

START
MIDDLE
hello world
the command exited with: status
END

Instead, this happened:

The process hangs at await and never resolves.

START
MIDDLE
hello world
test test has been running for over 60 seconds

@justin-elementlabs justin-elementlabs added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Aug 13, 2024
@Darksonn Darksonn added the M-process Module: tokio/process label Aug 13, 2024
@Darksonn
Copy link
Contributor

What happens when you run the equivalent code using std::process?

@justin-elementlabs
Copy link
Author

justin-elementlabs commented Aug 14, 2024

@Darksonn std::process doesn't hang, please see below

Code:

    println!("START");

    let mut child = Command::new("echo")
        .arg("hello")
        .arg("world")
        .stdout(Stdio::piped())
        .spawn()
        .expect("Failed to start echo process");

    let echo_out = child.stdout.expect("Failed to open echo stdout");
    println!("the command exited with: {:?}", echo_out);

    println!("END");

Output:

START
210
the command exited with: ChildStdout { .. }
END

Code:

    println!("START");

    let output = Command::new("echo")
    .arg("Hello world")
    .output()
    .expect("Failed to execute command");

    assert_eq!(b"Hello world\n", output.stdout.as_slice());
    println!("the command exited with: {:?}", output);

    println!("END");

Output:

START
the command exited with: Output { status: ExitStatus(unix_wait_status(0)), stdout: "Hello world\n", stderr: "" }
END

@Darksonn
Copy link
Contributor

You need to compare it where they do it in exactly the same way. Otherwise the comparison is useless. Your Tokio version uses Child::wait, but your std version does not.

Also, please make sure to use codeblocks:

https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks

I modified your latest post to do that, but please do it yourself in the future.

@justin-elementlabs
Copy link
Author

justin-elementlabs commented Aug 14, 2024

@Darksonn , is this what you had in mind? If not, please let me know the code you'd like me to run.

Code:

    println!("START");

    let mut child = Command::new("echo")
        .arg("hello")
        .arg("world")
        .spawn()
        .expect("failed to spawn");

    println!("210");

    // Await until the command completes
    let status = child.wait();
    println!("the command exited with: {:?}", status);

    println!("END");

Output:

START
210
hello world
the command exited with: Ok(ExitStatus(unix_wait_status(0)))
END

@justin-elementlabs
Copy link
Author

@Darksonn , what do you think of the example above? Thanks

@Darksonn
Copy link
Contributor

Sorry, I should have replied, but I forgot. I had a few other mac users try your example, and none of them were able to reproduce it. However, yesterday someone asked about a similar sounding issue.

I'm thinking that there may have been some sort of change in the Rust standard library that's causing this. Are you able to try different rustc versions to see if it's fixed on older compilers?

@CGamesPlay
Copy link

CGamesPlay commented Oct 19, 2024

I have seen a similar issue, but I was only able to reproduce it in very restricted circumstances (but consistently!). The process remained as a zombie and the wait future never resolved.

My case involved forking at the beginning of the program1 (to start a daemon process), then receiving a command over a UnixStream to spawn the process (sh -c false). In this exact case, the wait won't resolve probably 95% of the time. However, I found several stupid workarounds:

  • replacing the command with sh -c "sleep 1; false" worked about 50% of the time
  • sleeping for 100ms before I execute the initial fork works around the problem about 50% of the time
  • sleeping for 10ms between the spawn and the wait works 100% of the time
  • moving my spawning/waiting into a current_thread runtime in a background thread

A comparison of syscalls (from dtruss) between the working and not-working calls gave nothing useful, though. My best guess is there's some kind of race condition in the internal Reaper implementation. The implementation on macos uses SIGCHLD, which as the module says is "super hairy and complicated". A wild guess: here we see the child is spawned, then later a signal handler is installed for SIGCHLD, which may have already missed delivery (except, I can reproduce the problem even if the child is sleep 1, so this doesn't fully explain things).

Footnotes

  1. Using nix::unistd::fork; I did create a tokio runtime before the fork; the daemon drops this runtime and uses a new one; std::sync::Once may still have run in the parent. It's suspicious, but since adding random sleeps works around the issue, I don't think it's related.

@Darksonn
Copy link
Contributor

Darksonn commented Oct 19, 2024

I did create a tokio runtime before the fork

I'm sorry, but this is completely unsupported. Forking in this way is well known to cause problems specifically with tokio::process and tokio::signal. I can't help you if you're doing that.

Fork before you create your first runtime.

@CGamesPlay
Copy link

Ah, that's unfortunate. After you pointed it our I searched and found #4301. The latest comment is exactly what I am doing, so when it becomes supported my workaround won't be necessary 🙂

Shutting down all runtimes, calling fork, and then starting new runtimes afterwards seems possible to support.

@Darksonn
Copy link
Contributor

@justin-elementlabs Did you find a solution to your issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-process Module: tokio/process
Projects
None yet
Development

No branches or pull requests

3 participants