-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(nvmf): set netroot=nbft #10
Conversation
The logic added in 9b9dd99 ("35network-legacy: only skip waiting for interfaces if netroot is set") will cause all NBFT interfaces to be waited for unless the "netroot" shell variable is set. Avoid this by setting "netroot=nbft": this will cause the boot to proceed even NBFT interfaces are missing, as long as the initrd root file system has been found. This requires installing a netroot handler /sbin/nbftroot, which will be called by the networking scripts via /sbin/netroot when the interface has been brought up. Create a simple nbftroot script that simply calls nvmf-autoconnect.sh. With this installed, we can skip calling nvmf-autoconnect.sh from the "online" initqueue hook.
7af4d7f
to
c7731fc
Compare
As discussed on the last Timberland meeting, I double-checked the Elaborating some more, NM doesn't use Therefore I think the "problem" that inactive interfaces will be waited for in the "NVMe/TCP multipath" case does not exist with NM. The second problem described in #9 (second interface not up after boot) might very well exist, too. @johnmeneghini, @tbzatek: could you discuss this with NM experts for confirmation? The Footnotes |
I am not familiar with this topic, so I cannot give a qualified review. Only a comment about NetworkManager...
This quote from
|
The network-manager module works by starting NetworkManager as systemd service, and having a I'm not sure how the logic implemented in 9b9dd99 is going to work with NM, because the synchronization mechanism used by NM (via |
@thom311, @bengal, thanks for your comments.
So this differs from the way
For NBFT boot, the nvmf module generates
Yeah, it probably won't work this way. OTOH, you said NM waits until all interfaces are "activated or failed to activate". If an interface is unplugged, I suppose NM would wait for some time (probably I guess someone needs to just test this. @johnmeneghini, can you do this with the rh-poc? The almost correct behavior in the multipath case would be to wait forever until at least one interface is up, and once this happens, stop waiting for any other interfaces. The problem with this is that if there are multiple interfaces, you don't know if it's just multipath, or if different devices are accessed via different network / NVMe connections. But I guess we can ignore that for the time being. The really correct behavior (IMHO) would be to wait for connections and the root FS at the same time, and once all devices necessary to mount the root FS1 are detected, stop waiting for any other interfaces. This is basically how the legacy module behaves with this PR. I have now idea if, and how, that could be achieved with the dracut networkmanager module. May I ask whether you have discussed this issue in the context of iSCSI/iBFT multipath boot, and whether you have found a solution for that? Footnotes
|
Side note to @thom311: NM will also need support for NBFT-configured interfaces at run time (in the real root FS):
So far we have implemented this "feature set" in the SUSE tool "wicked". For wicked, I've written a shell-script plugin which reads the JSON-formatted HFI information from the NBFT and transforms it into XML that wicked understands. I suppose some similar approach would be possible for NM. NM has been on my todo list but I haven't had time to actually work on it. I've also repeatedly mentioned in Timberland meetings that this is a necessary puzzle piece to make NVMe boot production ready for NM-based systems. Some hints, or better even someone else looking to this with my support, would be much appreciated. |
Right.
That's correct. There is a dracut PR (dracutdevs#2173) to change this a bit, and run the hooks as soon as each interface is activated; but that doesn't change the fact that the initqueue runs after all interfaces are activated.
I'm not sure if by "unplugged" you mean with the cable unplugged (i.e. without carrier), or that the device is physically unplugged from the system (i.e. not present at all). In the first case there is a carrier-timeout of 10 seconds, in the second case the timeout for the device to appear is 60 seconds (only when
I guess that would require:
I am not aware of any previous discussion about this or similar issues. |
I meant "no carrier", or "down" for whatever other reason (e.g. no IP address obtained from DHCP). No hardware hot-plug discussion here :-) |
Why did you make
Hm. Strange. iSCSI multipath boot would have exactly the same problem. We have found a solution with network-legacy only quite recently, too. Perhaps people just don't use this technology. |
That is not different from other networking which is setup by NM in initrd (iBFT). Interestingly, NetworkManager to this day doesn't support something like systemd-networkd's |
right, it is not. But I guess someone needs to code the plugin :-) I'll have a look at NM's iBFT code and see to which extent it can be reused for NBFT support. |
the iBFT code for NetworkManager is here. |
@bengal, @johnmeneghini: acceptance of this PR is currently blocking the upstream dracut PR for timberland. Can we agree to merge this into |
There might have been other reasons that I don't remember, but I think the main one was to leave the hooks invocation in the initqueue, and only use unit dependencies as synchronization mechanism to ensure hooks are invoked only after the network is configured. In this way there is no need for custom scripts and everything works similarly as in the real root, using the network-online target. This can be revisited if there are issues not solvable with the current approach.
One problem in dracut is that there is no documentation or knowledge about supported use cases and this makes it difficult to introduce new features or do changes. It would be great if every use case would be covered by the test suite (see the test/ directory in the dracut tree). NetworkManager also tests different dracut scenarios in the integration test suite and it tries to cover most of the known use cases. |
This makes sense to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested these changes with Fedora and everything works.
As observed in your review comments, network manager doesn't appear to rely upon these changes and I am able to boot with multiple paths using multiple nbft attempt files w/out a problem. Error insertions tests are also passing showing that dracut will use any of the available paths and continue to boot from nbft correctly. When both paths are working correctly the system even boots from nbft and enables multipathing.
[root@host-vm ~]# nvme nbft show
/sys/firmware/acpi/tables/NBFT:
NBFT Subsystems:
Idx|NQN |Trsp|Address |SvcId|HFIs
---+--------------------------------------------------------------------+----+--------------+-----+----
1 |nqn.2014-08.org.nvmexpress:uuid:0c468c4d-a385-47e0-8299-6e95051277db|tcp |192.168.101.20|4420 |1
2 |nqn.2014-08.org.nvmexpress:uuid:0c468c4d-a385-47e0-8299-6e95051277db|tcp |192.168.110.20|4420 |1
NBFT HFIs:
Idx|Trsp|PCI Addr |MAC Addr |DHCP|IP Addr |Mask|Gateway |DNS
---+----+----------+-----------------+----+--------------+----+--------+--------
1 |tcp |0:0:4.0 |ea:eb:d3:58:89:58|no |192.168.101.30|24 |0.0.0.0 |0.0.0.0
2 |tcp |0:0:5.0 |ea:eb:d3:59:89:59|no |192.168.110.30|24 |0.0.0.0 |0.0.0.0
[root@host-vm ~]# nvme list-subsys
nvme-subsys0 - NQN=nqn.2014-08.org.nvmexpress:uuid:0c468c4d-a385-47e0-8299-6e95051277db
\
+- nvme0 tcp traddr=192.168.101.20,trsvcid=4420,host_traddr=192.168.101.30,src_addr=192.168.101.30 live
+- nvme1 tcp traddr=192.168.110.20,trsvcid=4420,host_traddr=192.168.101.30,src_addr=192.168.101.30 live
[root@host-vm ~]# ip -br addr
lo UNKNOWN 127.0.0.1/8 ::1/128
enp0s3 UP 192.168.0.216/24 2601:195:4000:62f:3467:102d:df16:84e7/64 fe80::875:9c79:c479:e6e4/64
nbft0 UP 192.168.101.30/24
nbft1 UP 192.168.110.30/24
This is a policy decision. We can't wait forever. This looks like a hung system. It is better to fail to boot and let the user intervene. The nbft table has a timeout. This can be used by the user set the timeout policy. If the use want to wait forever during boot, they can use this timeout to set the policy. |
I think are ready go move forward with the upstream dracut pull request. Please go ahead and merge this change and then move forward with the upstream pull request. |
dracut's default is to wait forever for the root FS. You can question whether that makes sense, but I don't think we should use a different default. |
I've been testing this and I see what you mean. I test things by toggling one or both of my nvme/tcp target port networks up and down on the target machine and then watching how the host reacts. When booting for the first time I see that UEFI will use the programmed timeout from NBFT. After timing out it returns to the Boot Menu. However, when I run the same test using a host reboot it hangs forever. I assume this is because a warm reboot is using initramfs and dracut is simply waiting forever, until I bring the IP link up on the nvme-tcp target port it's waiting for. Then is connects and boots. From what I can see dracut will not try to use the alternate path in this situation. I always hangs on the first path. I and bring the second path up and down and the host never see it. It hangs trying to boot from the first path.... forever. The firmware appears to do the same thing. So it looks like we still have some path ordering issues in EDK2, and in dracut. |
I think you mean the
So this was with both interfaces down?
Hm, I can't quite follow. Are you talking about a host reset from the BIOS menu? If yes, do you see the grub menu / the kernel booting? I would assume that a host reset goes through the BIOS, and would behave just like the first time boot.
If it's hanging in dracut with one interface up and one down, you're observing Problem 1 from #9. |
The logic added in 9b9dd99 ("35network-legacy: only skip waiting for interfaces if netroot is set") will cause all NBFT interfaces to be waited for unless the "netroot" shell variable is set. Avoid this by setting "netroot=nbft": this will cause the boot to proceed even NBFT interfaces are missing, as long as the initrd root file system has been found.
This requires installing a netroot handler /sbin/nbftroot, which will be called by the networking scripts via /sbin/netroot when the interface has been brought up. Create a simple nbftroot script that simply calls nvmf-autoconnect.sh. With this installed, we can skip calling nvmf-autoconnect.sh from the "online" initqueue hook.
Fixes #9, but only for the
network-legacy
networking backend.I think that with the network-manager backend, the issue doesn't exist in the first place.