Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xen: update to 4.14.2. #30599

Closed
wants to merge 2 commits into from
Closed

xen: update to 4.14.2. #30599

wants to merge 2 commits into from

Conversation

CMB
Copy link
Contributor

@CMB CMB commented Apr 30, 2021

  • xen: update to 4.14.2.

General

Have the results of the proposed changes been tested?

  • [ x] I use the packages affected by the proposed changes on a regular basis and confirm this PR works for me
  • I generally don't use the affected packages but briefly tested this PR

@ericonr
Copy link
Member

ericonr commented May 2, 2021

@void-linux/pkg-committers could someone help with review here?

1 similar comment
@ericonr
Copy link
Member

ericonr commented May 2, 2021

@void-linux/pkg-committers could someone help with review here?

sv check xenconsoled >/dev/null || exit 1
xenstore-write "/local/domain/0/domid" 0 || exit 1
xenstore-write "/local/domain/0/name" "Domain-0" || exit 1
if ! pgrep -f '^shim-xenconsoled' ; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running a supervisor you shouldn't need to scrape the process table to find out a process id. A better solution is to ask runsv directly. Something like if [ $(realpath /proc/$(head -n1 /var/service/xenconsoled/supervise/pid)/exe = /usr/bin/pause ]; then will get you the same result without needing a magic name and risking collisions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary if the dummy domU support is dropped from the services. Just do the write unconditionally as it is now.

@@ -1,5 +1,8 @@
#!/bin/sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to change this to #!/bin/sh -e or add set -e, which will let you remove all of the || exit 1 markers.

# The best I can come up with is to look at the open FDs, and if
# hypervisor.log is open, then xenconsoled is almost certainly started
# and ready to use.
PID="$(pidof xenconsoled)" || exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the process ID that runsv maintains at /var/service/$SERVICE/supervise/pid instead of banging away with pidof.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a few run scripts that exit early if dbus is not up which you could use for inspiration

# and ready to use.
PID="$(pidof xenconsoled)" || exit 1
expr "x${PID}" ':' 'x[0-9]\+$' > /dev/null || exit 1
ls -l "/proc/${PID}/fd" 2>/dev/null |grep -q hypervisor.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing this to lsof -p ${PID:-0} | grep -q hypervisor.log or rolling this and the PID= line into a single command. It still isn't perfect but should be less fiddly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this check really matters, and there's no reliable way to actually determine whether the service is running, might as well just use chpst -L in the run script to acquire a lock on some file like /run/xen/xenconsoled.lock. Then the check script can just try to acquire that lock and, if it succeeds, you know the service is not running.

Yes, this would be a hack. It would also be much less hacky than trying to scrape the process tree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a lock works too. I don't particularly like them in shell scripts but chpst -L lockfile ... in the run script and flock -n lockfile in the check script works as well.

@@ -1,5 +1,8 @@
#!/bin/sh
sv start xenconsoled > /dev/null || exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sv start wraps sv check so you can probably remove the check line.

@@ -1,4 +1,12 @@
#!/bin/sh
exec 2>&1
sv start xenstored >/dev/null || exit 1
sv check xenstored >/dev/null || exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same commentary wrt removing check as xen/run

sv check xenstored >/dev/null || exit 1

if pgrep -f '^shim-xenstored' ; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same commentary as xen/run wrt pgrep vs looking in `supervise/pid.


# Check for faux xenstored first. If it is, we aren't running under
# a Xen dom0. Consider xenstored started.
pgrep -f '^shim-xenstored' && exit 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

grep -q control_d /proc/xen/capabilities || exit 1
exec xenstored --verbose --no-fork

# shellcheck disable=SC1091
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style preference: don't leak shellcheck comments into run scripts (arguably, shellcheck complaining about a guarded file is a bug).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second, don't pollute the run scripts with spellcheck noise.

Fix build with recent argp-standalone.
I'll try and get this upstreamed so we don't have to carry it.

diff -Nur xen-4.14.1.orig/tools/configure.ac xen-4.14.1/tools/configure.ac
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change directory name to xen-4.14.2?

Copy link
Member

@ahesford ahesford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't exhaustively comment on the service files, but left a few points. In addition, consider these broad points:

  1. Never do no-op services. If the service only works in dom0, the user must be responsible enough to only enable it in dom0. Just fail in domU.
  2. If your no-op services aren't really no-op but actually do some upfront, one-shot work even in domU, either leave this to the user to put inrc.local or, if there is a really compelling reason, split into a "one-shot" that can be used in both domains. (Often there is no truly compelling reason for these one-shots.)
  3. Like heliocat says, don't do pgrep or other process hacks. Unlike heliocat, I think using /proc is just as ugly. Don't do it.
  4. Don't sv start from another service. Runit isn't good about dependencies, and these kind of hacky workarounds make the situation worse, not better. You can sv check if you really need to, but only if the service you check has a proper check script to meaningfully determine whether the service is sufficiently up. In that case, just fail if the dependency fails its check.

Rework the scripts in light of these and more specific comments and let's see where we stand.

grep -q control_d /proc/xen/capabilities || exit 1
exec xenstored --verbose --no-fork

# shellcheck disable=SC1091
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second, don't pollute the run scripts with spellcheck noise.

;;
esac

# shellcheck disable=SC2086
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no shell check noise

esac

# shellcheck disable=SC2086
exec "${XENSTORED}" ${XENSTORED_ARGS}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add --no-fork here rather than to the variable, because it will always be required and we don't want to make people who customize the args have to remember to include it.

# shellcheck disable=SC1091
[ -r conf ] && . ./conf
XENSTORED="${XENSTORED:-xenstored}"
case "$(basename "${XENSTORED}")" in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid the sub shell, ${XENSTORED##*/} works in place of base name.

Better yet, why does this case exist? Just drop the whole test, you already provide a mechanism for people to add --verbose In $XENSTORED_ARGS if they care.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look it up, it's picking between two different xenstored binaries, both of which are shipped with the package and I believe take different slightly different options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the only difference in the two cases is a default --verbose in one. Unless that is a required argument for oxenstored (which would shock me), it seems the case as used is superfluous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I made it as far as reading the documentation that noted that there were different binaries but didn't manage to find the option reference. I can't say I looked particularly hard so it could be that it's as minor a difference as that.

# and we can't start a real xenstored.
grep -q control_d /proc/xen/capabilities || \
exec chpst -b \
'shim-xenstored; see /etc/sv/xenstored/run to find out what this is' \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Don't make no-op services. If this isn't dom0, people shouldn't enable the service. Just fail here.
  2. Don't pollute the process name with documentation like this.

sv check xenconsoled >/dev/null || exit 1
xenstore-write "/local/domain/0/domid" 0 || exit 1
xenstore-write "/local/domain/0/name" "Domain-0" || exit 1
if ! pgrep -f '^shim-xenconsoled' ; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary if the dummy domU support is dropped from the services. Just do the write unconditionally as it is now.

# and ready to use.
PID="$(pidof xenconsoled)" || exit 1
expr "x${PID}" ':' 'x[0-9]\+$' > /dev/null || exit 1
ls -l "/proc/${PID}/fd" 2>/dev/null |grep -q hypervisor.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this check really matters, and there's no reliable way to actually determine whether the service is running, might as well just use chpst -L in the run script to acquire a lock on some file like /run/xen/xenconsoled.lock. Then the check script can just try to acquire that lock and, if it succeeds, you know the service is not running.

Yes, this would be a hack. It would also be much less hacky than trying to scrape the process tree.

@heliocat
Copy link
Contributor

heliocat commented May 3, 2021

3. Like heliocat says, don't do pgrep or other process hacks. Unlike heliocat, I think using `/proc` is just as ugly. Don't do it.

I didn't say I thought /proc wasn't ugly, more that it's less ugly to use the native pid following in runit and then check the specific process details than it is to walk the entire process tree. The chpst -L route is probably the least-bad since it'll be relatively TOCTOU-free.

@ahesford
Copy link
Member

ahesford commented May 3, 2021

I didn't say I thought /proc wasn't ugly, more that it's less ugly to use the native pid following in runit and then check the specific process details than it is to walk the entire process tree.

"[J]ust as ugly" is too strong; I agree using /proc isn't as bad as pgrep.

All of these are racy because no solutions guarantee that the daemon is in a usable state when the check is applied. I suppose a key question is what happens with xenstore-write if the necessary services aren't up. If the process just fails detectably, then drop the check script and the sv check and just monitor the return code on the xenstore-write invocations. Otherwise, if xenstore-write does unexpected things, the lock should accomplish what we need.

CMB added 2 commits June 20, 2021 18:35
This should fix issue 18676 (xen services spamming the user when not booted
into Xen.

Make the xenstored implementation selectable.  There are two
implementations shipped by Xen and in our Xen package:
the original one written in C (binary is just called xenstored) and
a newer one in ocaml (oxenstored).
oxenstored is -- IME -- more reliable.  Make it configurable in
./conf, but leave the default as-is.

Add a patch to fix failing to build from source on musl with newer
argp-standalone.
And apply patches for XSAs.

Also orphaning, since I haven't been a responsible maintainer and I
do not have a compelling reason to use this.
@CMB
Copy link
Contributor Author

CMB commented Jun 27, 2021

@heliocat @ahesford Thanks for reviewing this, and I'm sorry for taking
so long to circle back to it. I think all the issues you pointed out are now fixed.

@CMB
Copy link
Contributor Author

CMB commented Jun 27, 2021

Also orphaning. I do like the Xen tooling, but I want to get rid of another
moving part in my environment.

@github-actions
Copy link

Pull Requests become stale 90 days after last activity and are closed 14 days after that. If this pull request is still relevant bump it or assign it.

@github-actions github-actions bot added the Stale label May 18, 2022
@github-actions github-actions bot closed this Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants