Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requesting guide for (semi-properly) manually administering multiple boot environments #5809

Closed
evan-king opened this issue Feb 20, 2017 · 9 comments
Labels
Type: Documentation Indicates a requested change to the documentation

Comments

@evan-king
Copy link

evan-king commented Feb 20, 2017

I've followed this repo's guide for installing Ubuntu 16.04 on root zfs, and also successfully cloned the root fs and booted into it. However the details I came up with on my own for switching boot environments were not exactly coherent, reliable, or ultimately correct, though they sort of got the job done.

For example, /boot/grub/grub.conf is part of the root image, so there isn't one boot menu that can be maintained for all boot environments, and splitting it into a separate filesystem rendered the system unbootable. I do not adequately understand the stages of booting with regard to filesystem setup. There appears to be a point at which the filesystem in play is the first one (in alphabetical order) designated for / that doesn't have mounting disabled, and a later step which relies on the details of the grub.conf in whichever filesystem was used for the first pass. Further, I did not figure out how to mount and modify a root partition without being booted into it (meaning part of setting up a new BE is using the grub console to correct the fs reference on first and failing boot).

Even if I didn't worry about maintaining a single grub menu for all BEs, just having any one menu aware of all BEs would require being able to mount other roots and add an update-grub script similar to 10-linux with considerably greater complexity - and that one is already quite daunting. Several unknowns present, any one of which is troubling for my level of expertise:

  • mounting alternate roots to a different location than specified in the zfs property
  • enumerating the kernels on alternate roots
  • ensuring /boot contains the necessary support files for kernel bootstrapping (and updates don't remove them for one BE if another relies on them, which doesn't seem like a proper situation anyway)
  • understanding how/whether to maintain the grub menus on alternate roots (or have a master filesystem on which all BEs depend, or something else not occurring to me)
  • controlling which filesystem's /boot applies to first pass mounting
  • preserving the last BE choice and/or overriding the next/default BE choice

The long term solution is obviously getting our own beadm, but in the mean time would it be feasible to provide a manageable set of appropriate steps to manually administer multiple boot environments?

Minimally, we need:

  • instructions to create a new BE that will successfully boot without requiring steps which cannot be performed over SSH
  • instructions to set which BE will be used on next boot
  • explanation of any dangers regarding update-grub, or instructions that keep it in line with the BE

And ideally, we'd also like:

  • instructions for (safely) deleting a BE (including any appropriate maintenance)
  • some kind of lightweight solution for maintaining a grub menu that successfully lists and boots all installed kernals for all BEs

Plus maybe some bells and whistles:

  • assistance with identifying the current BE - I can contribute a bit to that, with this snippet that's handy for use in prompts and SSH welcome messages: CURRENT_BE=$(mount | grep -oP "(?<=syspool/ROOT/).*(?= on /)") (assuming a pool name of syspool).
  • assistance with identifying the designated BE for next boot
  • fallback mechanism allowing recovery from a bad BE with only SSH access
  • appropriate hooks for overall system health maintenance as needed (such as a zfs property flagging BEs as having outdated grub menus/automated triggering of update-grub and clearing of the flag)

I could probably provide some support for this request with documentation patches, but lack sufficient understanding of even the high-level picture let alone what would constitute following best practices and forward compatibility with a beadm port. Some links to resources that would productively allow me to self-educate on these specific issues/what exactly beadm does would also be welcome. It's unlikely I'd have sufficient resources to come full circle on this if I have to spend a lot of up front time on research, but it would still be helpful to me at least (and perhaps to others who might be inspired to contribute to either short or long-term solutions).

Cheers.

@madwizard
Copy link
Contributor

I'd be interested in helping out with this. I'm using ZFS on / for some time now. Although I use separate /boot partition to keep it simplier.

@behlendorf behlendorf added the Type: Documentation Indicates a requested change to the documentation label Feb 21, 2017
@behlendorf
Copy link
Contributor

@rlaager may also be interested in this.

@rlaager
Copy link
Member

rlaager commented Feb 21, 2017

This is definitely something I want, assuming we are talking about multiple clones of the installed image (not, for example, booting different distros). I think someone did a port of beadm to Linux, but I'm not sure the status or how that functions.

Right now (ZFS or not), the GRUB menu has entries for multiple kernels, but one root filesystem. The goal is to support multiple root filesystems, right? And we presumably need to keep the support multiple kernels. Are the kernels independent of the root filesystems, or are only some combinations acceptable? I think it's the latter, as the root filesystem contains kernel modules, for example.

I wonder if it would be reasonable to separate /boot/grub into its own dataset. Assuming that works, then /boot still lives inside the root filesystem, and thus we keep the association between kernels and root filesystems, without having to do complicated tracking. Then you're basically just building entries that look like:
linux /ROOT/ubuntu-2@/boot/vmlinuz-4.4.0-63-generic root=ZFS=rpool/ROOT/ubuntu ro
linux /ROOT/ubuntu-2@/boot/vmlinuz-4.4.0-62-generic root=ZFS=rpool/ROOT/ubuntu ro
linux /ROOT/ubuntu-1@/boot/vmlinuz-4.4.0-62-generic root=ZFS=rpool/ROOT/ubuntu ro
linux /ROOT/ubuntu-1@/boot/vmlinuz-4.4.0-59-generic root=ZFS=rpool/ROOT/ubuntu ro

How you build those is another question. The current situation is that 10_linux builds entries for the currently mounted root filesystem. If we left that alone, at least for the moment, a possible option would be to add 10_linux_zfs that gets a list of BEs, mounts them to temporary locations, sees what kernels they have, and builds them. I think we'd want to honor /etc/default/grub from inside the BE as well.

Seeing how complicated that code is would probably shape whether it should be merged into 10_linux.

@evan-king
Copy link
Author

evan-king commented Feb 23, 2017

What @rlaager asks and suggests sounds exactly right.*

Splitting /boot/grub into a separate dataset rather than all of /boot makes a lot of sense to me, though I'm not sure what steps are needed to enable either that or splitting /boot entirely. I've now tried both unsuccessfully, though @madwizard's comment indicates the latter is possible at least.

In the former case, I was left with a grub recovery console (as opposed to initramfs in other cases, where I was able to figure out what to do). The error message indicated failure to find contents that should have been in /boot/grub, and I don't know where to go from there.

It should also be possible to hose the current BE, including /boot, and have next boot (possibly cycling through a failed boot) end up on some other working environment. Perhaps that will require an additional chainloader? This degree of failure recovery at least is of secondary concern.


* For it to be useful we also need special handling of default/current/next BE. The way kernel choices/fallback are handled likely cover much of it, but when cloning roots there isn't the same simple relationship where first working option = best choice. Instead there's a need to be able to say "reboot into [named] environment (from now on)."

@evan-king
Copy link
Author

I've started some independent work to replace/augment the grub-mkconfig script with a new version. It is heavily refactored - actually built from scratch because I found the original script quite objectionable in structure and utterly daunting to modify at my skill level.

As such it's unlikely that my work will be of great interest to members of the zfsonlinux project or downstream packages. However if anyone is interested they're welcome to review or follow my progress at https://github.com/evan-king/grub2-zfs-be.

Licensing is currently unspecified but will be updated shortly. I'll happily set the licensing however is needed or would be most appropriate (current intent is to just slap on GPLv3 to match the script it's replacing).

@madwizard
Copy link
Contributor

@evan-king I mostly followed zfs on root howo and added encryptfs underneath. http://completelyfake.eu/2017/zfsonroot.html This is my howto. /boot is on ext4. I think steps to have separate /boot are not distribution specific.

@evan-king
Copy link
Author

evan-king commented Mar 3, 2017

Thansk for the extra info.

My work actually depends on leaving /boot in the rootfs and only splitting out/sharing /boot/grub, as described by @rlaager (and which I later succeeded at setting up by repeating some steps from the guide). As far as I can tell, this approach is incompatible with currently achievable mechanisms of encrypting the rootfs, due to the loss of independent per-root maintenance of kernel booting support.

Until the need for an ext4 boot partition is eliminated, I'd rather not attempt including support for encrypted rootfs. And to be honest, I see little value in encrypting the rootfs when it needn't ever contain sensitive data. If my grub-mkconfig script rewrite generates more interest/participation than I anticipate, I'll certainly reconsider.

But as alluded above, it's a considerable deviation both from the original script and what seems to be the status-quo approach to system scripting, with heavy use of small pseudo-pure functions, environment kept at arm's length or encapsulated in non-pure functions, avoidance of shared state, and almost hostility toward optimization or inlining logic. I'd fully understand if others found my approach too opinionated or drastic a change to collaborate productively on it - it's just what I need to be able to tackle the problem myself.

@ghost
Copy link

ghost commented Oct 7, 2017

This is definitely something I want, assuming we are talking about multiple clones of the installed image (not, for example, booting different distros)

Why not boot different distros?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Documentation Indicates a requested change to the documentation
Projects
None yet
Development

No branches or pull requests

5 participants
@rlaager @behlendorf @madwizard @evan-king and others