New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--run hides main process' exit code #886

Closed
LukeShu opened this Issue Jan 2, 2019 · 12 comments

Comments

3 participants
@LukeShu
Copy link
Contributor

LukeShu commented Jan 2, 2019

What happened:

$ kubernaut claims create --name=lukeshu --cluster-group main
…
$ export KUBECONFIG=/home/lukeshu/.kube/lukeshu.yaml
$ telepresence --run sh -c 'exit 7'
T: Invoking sudo. Please enter your sudo password.
[sudo] password for lukeshu: 
T: Starting proxy with method 'vpn-tcp', which has the following limitations: All processes are affected, only one telepresence can run per machine, and you can't use other VPNs. You may need to add cloud hosts and headless services with --also-proxy. For a full list of 
T: method limitations see https://telepresence.io/reference/methods.html
T: Volumes are rooted at $TELEPRESENCE_ROOT. See https://telepresence.io/howto/volumes.html for details.

T: No traffic is being forwarded from the remote Deployment to your local machine. You can use the --expose option to specify which ports you want to forward.

T: Exit cleanup in progress
$ echo $?
0

What I expected to happen: The exit status of the run command to be propagated out, or at least printed:

…
T: Main process (sh -c exit 7) exited with code 7.
$ echo $?
7
@LukeShu

This comment has been minimized.

Copy link
Contributor Author

LukeShu commented Jan 2, 2019

Note that reading runner.py, I'm not entirely sure why it doesn't print T: Main process (sh -c exit 7) exited with code 7.--it looks to me like it should.

@ark3

This comment has been minimized.

Copy link
Contributor

ark3 commented Jan 2, 2019

self.write(...) vs self.show(...).

I don't agree that Telepresence should appear to fail if the user process exits with a non-zero status. However, I think it makes sense to show a non-zero status to the user (and not just mention it in the log file).

@ark3

This comment has been minimized.

Copy link
Contributor

ark3 commented Jan 2, 2019

Okay, maybe Telepresence does subvert expectations by returning success. Telepresence behaves more like ssh and less like sudo or docker run.

Edit to add: As Luke points out below, Telepresence does not behave like ssh either.

@LukeShu LukeShu changed the title --run exit code is wrong --run hides main process' exit code Jan 2, 2019

@LukeShu

This comment has been minimized.

Copy link
Contributor Author

LukeShu commented Jan 2, 2019

"Wrong" might be a little strong, but propagating the exit code is definitely what users expect, especially if we're encouraging them to use Telepresence in CI.

And, FWIW:

$ ssh lukeshu.com 'sh -c "exit 7"'
$ echo $?
7

@ark3 ark3 added this to To do in Tel Tracker via automation Jan 2, 2019

@plombardi89

This comment has been minimized.

Copy link
Contributor

plombardi89 commented Jan 2, 2019

Seems like a good use for a flag --propogate-exitcode... Or an inverse of that depending on the preferred default.

@LukeShu

This comment has been minimized.

Copy link
Contributor Author

LukeShu commented Jan 2, 2019

If you want the exit status hidden, the user can say --run sh -c "YOUR_COMMAND || true"

@plombardi89

This comment has been minimized.

Copy link
Contributor

plombardi89 commented Jan 3, 2019

While they could do that it's not necessarily obvious or user friendly. While I'm sure many users use shell, I'm not convinced many users are nearly as familiar with shell idioms and usage patterns.

@LukeShu

This comment has been minimized.

Copy link
Contributor Author

LukeShu commented Jan 3, 2019

I figure the intersection of "users who care about whether an exit code came from telepresence itself or the child process" and "users who are not familiar with the || true trick" is pretty small.

@ark3

This comment has been minimized.

Copy link
Contributor

ark3 commented Jan 4, 2019

Telepresence's current behavior is to exit with status 1 on a detected error, status 3 if a session is successfully started and then fails ("lost connection"), and status 0 otherwise. I don't know how important it is to preserve this behavior. What is the likelihood that anyone would depend on it, or could depend on it in any useful way?

ark3 added a commit that referenced this issue Jan 4, 2019

@LukeShu

This comment has been minimized.

Copy link
Contributor Author

LukeShu commented Jan 4, 2019

I don't think that the current behavior is in the docs, so I'd say that the likely hood that anyone is depending on it is pretty low.

We could also take the flock(1) approach, and add --FOO-exit-code N flags to be able to adjust what statuses teleproxy itself emits. Eg. --failed-to-start-exit-code N (default 1) and --lost-connection-exit N (default 3).

@ark3

This comment has been minimized.

Copy link
Contributor

ark3 commented Jan 7, 2019

I'm reluctant to add more options to the Telepresence command line without very careful consideration. Telepresence is already too complicated to launch. See also #455.

I am coming around to the idea of having Telepresence exit status change to be more like that of ssh, which return 255 on error but otherwise returns the status of the executed command. Thoughts?

Users, if this is important to you, please up-vote this issue.

@LukeShu

This comment has been minimized.

Copy link
Contributor Author

LukeShu commented Jan 8, 2019

POSIX specifies (volume 3, chapter 2) that exit codes ≥126 (well, 126, 127, and ≥129) are to indicate that there was an error running a program, and that programs "should" avoid using them for other purposes. 126 is "command exists but is not chmod +x", 127 is "command not found", 128+n is "killed by a signal n". Because POSIX reserves the entire 129-255 range for "killed by signal" exit codes, but no system has that many signals, it is "safe" to use the higher end of that range for program-specific "I failed to create a child process" exit codes, as ssh does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment