Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/test-driver: add backdoor based on systemd-ssh-proxy & AF_VSOCK #392030

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

Ma27
Copy link
Member

@Ma27 Ma27 commented Mar 22, 2025

Note: Only the last commit is relevant right now.
Depends on #390996 & #372979.


With this it's possible to trivially SSH into running machines from the
test-driver. This is especially useful when running VM tests
interactively on a remote system.

This is based on systemd-ssh-proxy(1), so there's no need to configure
any additional networking on the host-side.

Suggested-by: Ryan Lahfa masterancpp@gmail.com

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

Ma27 and others added 14 commits March 20, 2025 12:30
Replaces / Closes NixOS#345948

I tried to integrate `pytest` assertions because I like the reporting,
but I only managed to get the very basic thing and even that was messing
around a lot with its internals.

The approach in NixOS#345948 shifts too much maintenance effort to us, so
it's not really desirable either.

After discussing with Benoit on Ocean Sprint about this, we decided that
it's probably the best compromise to integrate `unittest`: it also
provides good diffs when needed, but the downside is that existing tests
don't benefit from it.

This patch essentially does the following things:

* Add a new global `t` that is an instance of a `unittest.TestCase`
  class. I decided to just go for `t` given that e.g.
  `tester.assertEqual` (or any other longer name) seems quite verbose.

* Use a special class for errors that get special treatment:
  * The traceback is minimized to only include frames from the
    testScript: in this case I don't really care about anything else and
    IMHO that's just visual noise.

    This is not the case for other exceptions since these may indicate a
    bug and then people should be able to send the full traceback to the
    maintainers.
  * Display the error, but with `!!!` as prefix to make sure it's
    easier to spot in between other logs.

This looks e.g. like

    !!! Traceback (most recent call last):
    !!!   File "<string>", line 7, in <module>
    !!!     foo()
    !!!   File "<string>", line 5, in foo
    !!!     t.assertEqual({"foo":[1,2,{"foo":"bar"}]},{"foo":[1,2,{"bar":"foo"}],"bar":[1,2,3,4,"foo"]})
    !!!
    !!! NixOSAssertionError: {'foo': [1, 2, {'foo': 'bar'}]} != {'foo': [1, 2, {'bar': 'foo'}], 'bar': [1, 2, 3, 4, 'foo']}
    !!! - {'foo': [1, 2, {'foo': 'bar'}]}
    !!! + {'bar': [1, 2, 3, 4, 'foo'], 'foo': [1, 2, {'bar': 'foo'}]}
    cleanup
    kill machine (pid 9)
    qemu-system-x86_64: terminating on signal 15 from pid 6 (/nix/store/wz0j2zi02rvnjiz37nn28h3gfdq61svz-python3-3.12.9/bin/python3.12)
    kill vlan (pid 7)
    (finished: cleanup, in 0.00 seconds)

Co-authored-by: bew <bew@users.noreply.github.com>
…chine class

I think it's reasonable to also have this kind of visual distinction
here between test failures and actual errors from the test framework.

A failing `machine.require_unit_state` now lookgs like this for
instance:

    !!! Traceback (most recent call last):
    !!!   File "<string>", line 3, in <module>
    !!!     machine.require_unit_state("postgresql","active")
    !!!
    !!! RequestedAssertionFailed: Expected unit 'postgresql' to to be in state 'active' but it is in state 'inactive'

Co-authored-by: Benoit de Chezelles <bew@users.noreply.github.com>
When doing `machine.succeed(...)` or something similar, it's now clear
that the command `...` was issued on `machine`.

Essentially, this results in the following diff in the log:

    -(finished: waiting for unit default.target, in 13.47 seconds)
    +machine: (finished: waiting for unit default.target, in 13.47 seconds)
    (finished: subtest: foobar text lorem ipsum, in 13.47 seconds)
After a discussion with tfc, we agreed that we need a distinction
between errors where the user isn't at fault (e.g. OCR failing - now
called `MachineError`) and errors where the test actually failed (now
called `RequestedAssertionFailed`).

Both get special treatment from the error handler, i.e. a `!!!` prefix
to make it easier to spot visually.

However, only `RequestedAssertionFailed` gets the shortening of the
traceback, `MachineError` exceptions may be something to report and
maintainers usually want to see the full trace.

Suggested-by: Jacek Galowicz <jacek@galowicz.de>
Closes NixOS#180089

I realized that the previous commits relying on `sys.exit` for dealing
with `MachineError`/`RequestedAssertionFailed` exit the interactive
session which is kinda bad.

This patch uses the ipython driver: it seems to have equivalent features
such as auto-completion and doesn't stop on SystemExit being raised.

This also fixes other places where this happened such as things calling
`log.error` on the CompositeLogger.
This enables provisioning of root ssh keys with systemd credentials
(e.g. passed in via smbios strings or kernel params)
In afeb76d, sshd.service and
sshd@.service were switched to Type=notify. This apparently works for
sshd.service, but not for sshd@.service. Given that the reason for
this working with sshd.service isn't exactly clear, let's revert it
for both of them for now, and revisit Type=notify later.
This enables systemd-ssh-generator to find the sshd binary.
We need to patch systemd to make systemd-ssh-generator(8) work:

- systemd doesn't follow symlinks when checking for a packaged
  sshd@.service unit
- systemd tries to link the ssh config dropins by default with tmpfiles
  to /usr, that is not possible, so we include the snippet manually.
With this it's possible to trivially SSH into running machines from the
test-driver. This is especially useful when running VM tests
interactively on a remote system.

This is based on `systemd-ssh-proxy(1)`, so there's no need to configure
any additional networking on the host-side.

Suggested-by: Ryan Lahfa <masterancpp@gmail.com>
@Ma27 Ma27 requested review from tfc, RaitoBezarius and K900 March 22, 2025 08:22
@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 6.topic: testing Tooling for automated testing of packages and modules 6.topic: systemd 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 1-10 10.rebuild-linux: 5001+ 10.rebuild-linux: 501+ labels Mar 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: systemd 6.topic: testing Tooling for automated testing of packages and modules 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 1-10 10.rebuild-linux: 501+ 10.rebuild-linux: 5001+
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants