-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve robustness of SSH connection to SUT #2890
Comments
Please, check #2696, it seems to be related to your situation, and if it would not help, it would be very useful if you could share with us what should be included to help with your case.
Yeah, in your case, when you and your test are causing the changes on purpose. But telling the difference between "expected" and "the lab in the US is burning down" is the hard part, I would very much dispute the "likely" bit :)
Re-running "from the scratch" would be a possible solution, with or without dropping the test. Restarting the plan, including provisioning a brand new guest to avoid running tests in a tainted environment. Well beyond what tmt can do now though. |
IMO #2696 solves (2) for me. Ad (1). I am starting to realize that this might not be possible. What I am looking for is to be able to "resurrect" closed SSH session. This is not possible by design of SSH unless some other tool handles it underneath the ssh connection - so that once SSH connection is closed and SSH reconnects it can continue where it was closed before (e.g. something like screen or tmux). To be more specific, I have the following plan:
It breaks openssl library, that will cause ssh connection to drop and run it aborted. In theory, ssh is able to reconnect even while openssl library is still broken. But obviously tmt won't do that because, if I understand it correctly, it wouldn't be able to just to resume the test anyway. It just does not work that way. So (1) is basically not possible unless the test can resume itself (then it would work thanks to #2696). So all in all, it seems that aforementioned plan is simply tmt-incompatible. |
I would have to modify 'crasher' test to detect reboot and to attempt to restore the library (and then use options added in #2696) to get to 'second test'. |
TMT uses ssh session to watch the SUT (guest machine running the test). When anything happen to this session, run is aborted completely. It should be acceptable for test to break the connection temporarily (e.g. a test might affect networking for a show time, test might mess with libraries that ssh relies on, etc.). I would like to propose:
Make ssh session more robust to be able to handle these situations, when session breaks - attempt to retry connection and resume test (it is very likely that all disruptive test actions will be completed and reverted at that point).
In rare situations when ssh session cannot be resumed even after several attempts don't abort the run. Attempt to salvage the run results preceding the problem and run the remaining ones from scratch again. Or perhaps disable the problematic test and create a new run without it.
The text was updated successfully, but these errors were encountered: