Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on restore related to VM tags #25

Closed
UndeadDevel opened this issue May 23, 2024 · 6 comments
Closed

Crash on restore related to VM tags #25

UndeadDevel opened this issue May 23, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@UndeadDevel
Copy link

Got this when trying to restore a bunch of qubes from backup (both wyng and wyng-util-qubes are on the main branches; for wyng: prog_date = "20240515"; the util file was updated the same day):

Traceback (most recent call last):
  File "/usr/local/bin/wyng-util-qubes", line 460, in <module>
    for tag in qube.tags:   vm.tags.add(tag)
                            ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/qubesadmin/tags.py", line 43, in add
    self.vm.qubesd_call(self.vm.name, 'admin.vm.tag.Set', elem)
  File "/usr/lib/python3.11/site-packages/qubesadmin/base.py", line 76, in qubesd_call
    return self.app.qubesd_call(dest, method, arg, payload,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/qubesadmin/app.py", line 787, in qubesd_call
    return self._parse_qubesd_response(return_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/qubesadmin/base.py", line 89, in _parse_qubesd_response
    raise qubesadmin.exc.QubesDaemonAccessError(
qubesadmin.exc.QubesDaemonAccessError: Got empty response from qubesd. See journalctl in dom0 for details.

Error code seems to be "1". The qube in question was a kind of gpg-vault, which had "Split GPG" policies still in place in the Qubes Global Config from before I deleted it (not sure if relevant...I do see messages in journalctl about disabling the qubes-vm@gpg-vault.service, followed by wireplumber[19406]: GetManagedObjects() failed: org.freedesktop.DBus.Error.NameHasNoOwner errors and dom0 audit BPF messages; this is repeated twice). The fatal error in journalctl comes a few seconds later and shows:

May 23 13:13:13 dom0 qubesd[2790]: Updating appmenus for 'gpg-vault' in 'dom0'
May 23 13:13:13 dom0 audit[20422]: CRED_ACQ pid=20422 uid=0 auid=xxxxxxxxxx ses=xxxxxxxxxx msg='op=PAM:setcred grantors=pam_rootok acct="my_dom0_user_acc" exe="/usr/sbin/runuser" hostname=? addr=? termina>
May 23 13:13:13 dom0 kernel: kauditd_printk_skb: 46 callbacks suppressed
May 23 13:13:13 dom0 kernel: audit: type=1103 audit(xxxxxxxxxx.xxx:xxx): pid=20422 uid=0 auid=xxxxxxxxxx ses=xxxxxxxxxx msg='op=PAM:setcred grantors=pam_rootok acct="my_dom0_user_acc" exe="/usr/sbin/runus>
May 23 13:13:13 dom0 runuser[20422]: pam_unix(runuser:session): session opened for user my_dom0_user_acc(uid=1000) by (uid=0)
May 23 13:13:13 dom0 kernel: audit: type=1105 audit(xxxxxxxxx.xxx:xxx): pid=20422 uid=0 auid=xxxxxxxxxx ses=xxxxxxxxxx msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_unix acct>
May 23 13:13:13 dom0 audit[20422]: USER_START pid=20422 uid=0 auid=xxxxxxxxxx ses=xxxxxxxxxx msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_unix acct="my_dom0_user_acc" exe="/usr/sbin/runuse>
May 23 13:13:14 dom0 qubesd[2790]: unhandled exception while calling src=b'dom0' meth=b'admin.vm.tag.Set' dest=b'gpg-vault' arg=b'created-by-disp-backup-restore' len(untrusted_payload)=0
May 23 13:13:14 dom0 qubesd[2790]: Traceback (most recent call last):
May 23 13:13:14 dom0 qubesd[2790]:   File "/usr/lib/python3.11/site-packages/qubes/api/__init__.py", line 297, in respond
May 23 13:13:14 dom0 qubesd[2790]:     response = await self.mgmt.execute(
May 23 13:13:14 dom0 qubesd[2790]:                ^^^^^^^^^^^^^^^^^^^^^^^^
May 23 13:13:14 dom0 qubesd[2790]:   File "/usr/lib/python3.11/site-packages/qubes/api/admin.py", line 561, in vm_tag_set
May 23 13:13:14 dom0 qubesd[2790]:     self.dest.tags.add(self.arg)
May 23 13:13:14 dom0 qubesd[2790]:   File "/usr/lib/python3.11/site-packages/qubes/vm/__init__.py", line 158, in add
May 23 13:13:14 dom0 qubesd[2790]:     self.vm.fire_event('domain-tag-add:' + elem, tag=elem)
May 23 13:13:14 dom0 qubesd[2790]:   File "/usr/lib/python3.11/site-packages/qubes/events.py", line 195, in fire_event
May 23 13:13:14 dom0 qubesd[2790]:     sync_effects, async_effects = self._fire_event(event, kwargs,
May 23 13:13:14 dom0 qubesd[2790]:                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 23 13:13:14 dom0 qubesd[2790]:   File "/usr/lib/python3.11/site-packages/qubes/events.py", line 164, in _fire_event
May 23 13:13:14 dom0 qubesd[2790]:     effect = func(self, event, **kwargs)
May 23 13:13:14 dom0 qubesd[2790]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 23 13:13:14 dom0 qubesd[2790]:   File "/usr/lib/python3.11/site-packages/qubes/ext/gui.py", line 149, in set_guivm_on_created_by
May 23 13:13:14 dom0 qubesd[2790]:     created_by = vm.app.domains[tag.partition('created-by-')[2]]
May 23 13:13:14 dom0 qubesd[2790]:                  ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 23 13:13:14 dom0 qubesd[2790]:   File "/usr/lib/python3.11/site-packages/qubes/app.py", line 512, in __getitem__
May 23 13:13:14 dom0 qubesd[2790]:     raise KeyError(key)
May 23 13:13:14 dom0 qubesd[2790]: KeyError: 'disp-backup-restore'

As a side note, in the same backup restore session one other qube errored out with "Not matched", presumably because it had no change in my last backup, so its "Latest" session was different from the "Latest" session of the other qubes I was restoring, even though I didn't specify a session for the restore command, only qube names. IMHO the expected behavior is that it will pick the latest session available for each specified qube unless I specify a --session.

@tasket
Copy link
Owner

tasket commented May 24, 2024

This does look like one of those "can't/won't do it" behaviors from the Qubes daemon due to extenuating circumstances / absent resources. Although it does seem odd it would react that way to a simple tag, I guess some tags are special. It would be interesting to see the result from qvm-backup-restore in this case, if its any different. As it happens, I have a SplitGPG setup too so I can do some testing with it.

Incidentally, Marek's (of Qubes) advice on this kind of error-from-settings situation is to catch all the errors and notify the user, then continue working. I mostly took his advice but I somehow thought tags wouldn't pose any problem, so that specific code doesn't catch exceptions. So that will have to change.


The list command will now show the latest session for each VM to give you an idea of what will get auto-selected if you let restore choose the latest session. That result is somewhat intentional and the reason is that VM settings are mainly in a single xml file, all together (its also why the util will tell you what didn't match, so you can do subsequent restores for non-matching VMs). This includes named dispVMs too, which exist only in the xml; Wyng can't map them out so they don't show up at all unless a session is specified (where a single qubes.xml exists). The util could pull every known xml version and interrogate them, which I think is a lot of additional code and introduces delays. Instead, I had Wyng record any specified-but-unchanged vols along with changed ones to make indexing in the util more straightforward; I also added a feature to Wyng that allows non-strict session specs so that a next-latest session can fill a request for a volume not in the requested session. The latter feature hasn't yet surfaced in the util, though, partly because there is still a question of which xml from when to use where; the possibility of having a VM with settings from an older arrangement/config could result in, for example, a networked application connecting through sys-firewall, what was used before the user moved most of their comms to Tor. This can still happen with the util, but the user will be aware they are pulling from an older backup instead of the difference possibly going unnoticed. So its an automation and UI challenge, where automating it halfway would introduce a lot of foot-guns. TL;dr the current code treats Wyng sessions much like Qubes restore treats an individual archive; things could get really messy otherwise.

Of course, I'm open to ideas and POCs about how to make restores more convenient. One thing that could be done is to return to the practice of "no session means only one VM name is accepted" and then the latest session for that VM is chosen. But that older version didn't accept more than one VM name to restore in any case. An update of that could check that all VMs requested are from the same session, for instance.

@UndeadDevel
Copy link
Author

It would be interesting to see the result from qvm-backup-restore in this case, if its any different.

Well I did a large backup restore under very similar circumstances with almost the same qubes, also including that gpg-vault qube, using the qubes GUI manager (which uses qvm-backup-restore under the hood) a few months ago and it worked without issue (though I didn't check the logs).
I've now tested this case (deleted the qube again and restored from a regular qubes backup with the qubes tools), which confirms that the error also occurs with the regular qubes tools, but they don't crash as a result of the error; I've reported this on the Qubes issue tracker now (here). So there might be some QubesOS bug, but as far as wyng goes, handling this error better so it doesn't crash the program should be done and would fix this issue.


Regarding wyng-util-qubes restore: I think what may have happened, IIRC, was that I made a wyng backup (and regular Qubes backup just in case), started deleting qubes, but stopped after the first because I remembered about a change I still wanted to make, made the change, did another wyng backup and deleted all relevant (remaining) qubes.
Upon restoring then it didn't restore the qube I had already deleted before the last backup, as that was from an earlier backup session. If this is how it happened then I suppose the behavior of wyng was fine and doesn't need to be changed.
Thank you for the elaboration and explanations!

tasket added a commit that referenced this issue May 25, 2024
Add verbose option for restore and verify
@tasket
Copy link
Owner

tasket commented May 30, 2024

The crashing shouldn't occur anymore. Maybe change the title and devote this issue to special handling of vault VMs?

@UndeadDevel
Copy link
Author

The crashing shouldn't occur anymore. Maybe change the title and devote this issue to special handling of vault VMs?

Great! I think that means this issue can be closed as completed; what special handling did you have in mind?

@tasket
Copy link
Owner

tasket commented May 30, 2024

Its fine to leave it open, since you reported two things; just change title and maybe bit of main description to be about vault handling. Its up to you.

@UndeadDevel
Copy link
Author

I'm confused about what you mean regarding "vault handling"; as I see it the issue was about the crash as a result of an uncaught exception that occurred when the gpg-vault qube was restored; I reported that to the QubesOS issue tracker as well, as I've noticed that it occurs with the Qubes tools, too. Removing one of the tags stops the error from occurring and since you now improved how wyng-util-qubes handles these kinds of situations I see the issue as resolved.

The other line of discussion was about how wyng selects VMs for restoration, but that is also resolved IMO, as I've remembered that I probably did two backups, while deleting the qube in question in between, such that wyng's behavior is actually fine.

Maybe you're referring to the fact that in the OP I was wondering if Split-GPG policies had to do with this? If so, then I now don't think so anymore; instead, the issue is related to the QubesOS bug I reported and how wyng-util-qubes handles such situations (now resolved).

So I'll close this as completed. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants