New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get iPXE to boot on AArch64 #70

Closed
rengolin opened this Issue Aug 29, 2017 · 10 comments

Comments

Projects
None yet
2 participants
@rengolin
Contributor

rengolin commented Aug 29, 2017

I'm using tip deveopment branch and am trying to get TFTP to work, but on booting the node, it doesn't do anything after getting an IP from DHCP.

My steps:

  1. I have registered the node with wwnodescan, which found the correct machine/MAC and set the appropriate node name.
# wwsh node list
NAME                GROUPS              IPADDR              HWADDR             
================================================================================
node1               UNDEF               192.168.10.10       a0:8c:f8:62:5a:99  
  1. I have ran wwsh provision and wwsh pxe update and it all seems fine:
# wwsh provision list
NODE                VNFS            BOOTSTRAP             FILES                
================================================================================
node1               centos7.3       4.12.0-1.1.aarch64    dynamic_hosts,grou...
  1. I have updated the DHCP config as per OpenHPC 1.3.1's recipe:
# cat /etc/dhcp/dhcpd.conf 
ddns-update-style interim;
subnet 192.168.10.0 netmask 255.255.255.0 {
  option routers                  192.168.10.1;
  option subnet-mask              255.255.255.0;
  range 192.168.10.10 192.168.10.200;
}
  1. I've updated xinetd as per OpenHPC 1.3.1's recipe:
# cat /etc/xinetd.d/tftp
service tftp
{
	socket_type		= dgram
	protocol		= udp
	wait			= yes
	user			= root
	server			= /usr/sbin/in.tftpd
	server_args		= -s /var/lib/tftpboot
	disable			= yes
	per_source		= 11
	cps			= 100 2
	flags			= IPv4
}
  1. I'm using Linaro's ERP kernel, given that nothing else will boot on our machines. Would that be a problem for warewulf figuring out how to build the EFI image?

Though, when the node PXE boots, I get the correct DHCPDISCOVER and DHCPOFFER, but the boot freezes. Upon investigating the packages with tcpdump I saw that the offers didn't have the TFTP payload. I also noticed that the HTTP configuration was set to default IPv6 instead of IPv4.

I then changed the DHCP configuration according to this page:

https://wiki.centos.org/HowTos/PXE/PXE_Setup

Which ended up as:

ddns-update-style interim;
subnet 192.168.10.0 netmask 255.255.255.0 {
  option routers                  192.168.10.1;
  option subnet-mask              255.255.255.0;
  range 192.168.10.10 192.168.10.200;
}
allow booting;
allow bootp;
filename "/var/lib/tftpboot/warewulf/ipxe/bin-arm64-efi/snp.efi";

and changed the HTTP config to listen on 0.0.0.0:80 and restarted the services, which then, I got PXE-E99 (can't boot the EFI image).

I'm not an expert in PXE booting, but I got a few lessons from this:

  1. Maybe OpenHPC should add "Listen 0.0.0.0:80" on their HTTP config in their recipe. It wasn't necessary before (I'm not sure why), but it may be required (or at least a sane default).
  2. Maybe OpenHPC should add the "bootp + EFI image" to the DHCP config. Again, maybe it's to do with the move to iPXE? Does warewulf store that EFI image elsewhere (MySQL?) and if so, how does it retrieves it for DHCP?
  3. I seem to be using the wrong image, or the image wasn't built correctly. Or maybe the alternative kernel (installed via PXE boot of the master using Linaro's ERP CentOS 17.08)

Am I doing something terribly wrong?

Refs:
ERP: https://platforms.linaro.org/documentation/Reference-Platform/Platforms/Enterprise/ReleaseNotes-17.08.md/
CentOS ERP: http://builds.96boards.org/snapshots/reference-platform/components/centos-installer-staging/97/

@bensallen

This comment has been minimized.

Show comment
Hide comment
@bensallen

bensallen Aug 29, 2017

Member

Hi rengolin,

For any fixes in OpenHPC recipes, please open an issue with that project. We're not responsible for what they have put together.

See /etc/warewulf/dhcpd-template.conf (ex. https://github.com/warewulf/warewulf3/blob/development/provision/etc/dhcpd-template.conf), which is populated by running wwsh dhcp update, or on any host change in Warewulf. Warewulf manages /etc/dhcp/dhcpd.conf.

  • Filename line needs to be relative to the root served by the TFTP server
  • iPXE config filename in dhcpd.conf after iPXE is loaded isn't specified
  • For CentOS 7 we don't use xinetd to run tftpd we're using systemd's tftpd.socket, although I don't think wwinit know this.
  • Depending on when you built your RPMs we moved the iPXE binaries to /var/warewulf/ipxe

I suggest starting with a clean system. Rebuilding RPMs if they're older than a week or so. Ignore most instructions from OHPC about Warewulf install and config. Install the Warewulf development branch RPMs, go through configs in /etc/warewulf updating as needed and run wwinit ALL.

Member

bensallen commented Aug 29, 2017

Hi rengolin,

For any fixes in OpenHPC recipes, please open an issue with that project. We're not responsible for what they have put together.

See /etc/warewulf/dhcpd-template.conf (ex. https://github.com/warewulf/warewulf3/blob/development/provision/etc/dhcpd-template.conf), which is populated by running wwsh dhcp update, or on any host change in Warewulf. Warewulf manages /etc/dhcp/dhcpd.conf.

  • Filename line needs to be relative to the root served by the TFTP server
  • iPXE config filename in dhcpd.conf after iPXE is loaded isn't specified
  • For CentOS 7 we don't use xinetd to run tftpd we're using systemd's tftpd.socket, although I don't think wwinit know this.
  • Depending on when you built your RPMs we moved the iPXE binaries to /var/warewulf/ipxe

I suggest starting with a clean system. Rebuilding RPMs if they're older than a week or so. Ignore most instructions from OHPC about Warewulf install and config. Install the Warewulf development branch RPMs, go through configs in /etc/warewulf updating as needed and run wwinit ALL.

@bensallen bensallen added the question label Aug 29, 2017

@rengolin

This comment has been minimized.

Show comment
Hide comment
@rengolin

rengolin Aug 29, 2017

Contributor

Regarding OpenHPC, don't worry, I was just asking for an opinion, so I can propose the right thing from warewulf's point of view.

I'll do as you propose, wipe and start warewulf directly via wwinit all instead of OpenHPC's recipes.

Some notes...

wwsh dhcp update is actually better than what the current OpenHPC document suggests, so I'll propose that. It also correctly picked my node config and will force the correct IP, which is great.

But I still get the same error, even when I only leave the right filename (no if/else block) with the patch relative to the TFTP root:

filename "/warewulf/ipxe/bin-arm64-efi/snp.efi";

The rest looks correct:

# l /var/warewulf/bootstrap/aarch64/8/
total 45092
-rw-r--r--. 1 root root       32 Aug 29 12:28 cookie
-rw-r--r--. 1 root root 29170583 Aug 29 12:28 initfs.gz
-rw-r--r--. 1 root root 16996864 Aug 29 12:28 kernel
# cat /var/warewulf/ipxe/cfg/a0\:8c\:f8\:62\:5a\:99 
#!ipxe
# Configuration for Warewulf node: node1
# Warewulf data store ID: 15
echo Now booting node1 with Warewulf bootstrap (4.12.0-1.1.aarch64)
set base http://192.168.10.1/WW/bootstrap
initrd ${base}/aarch64/8/initfs.gz
kernel ${base}/aarch64/8/kernel ro initrd=initfs.gz wwhostname=node1 net.ifnames=0 biosdevname=0 quiet wwmaster=192.168.10.1 wwipaddr=192.168.10.10 wwnetmask=255.255.255.0 wwnetdev=eth0 wwhwaddr=a0:8c:f8:62:5a:99 
boot
# file /var/lib/tftpboot/warewulf/ipxe/bin-arm64-efi/snp.efi 
/var/lib/tftpboot/warewulf/ipxe/bin-arm64-efi/snp.efi: MS-DOS executable

I'll start fresh and update this ticket.

Thanks!

Contributor

rengolin commented Aug 29, 2017

Regarding OpenHPC, don't worry, I was just asking for an opinion, so I can propose the right thing from warewulf's point of view.

I'll do as you propose, wipe and start warewulf directly via wwinit all instead of OpenHPC's recipes.

Some notes...

wwsh dhcp update is actually better than what the current OpenHPC document suggests, so I'll propose that. It also correctly picked my node config and will force the correct IP, which is great.

But I still get the same error, even when I only leave the right filename (no if/else block) with the patch relative to the TFTP root:

filename "/warewulf/ipxe/bin-arm64-efi/snp.efi";

The rest looks correct:

# l /var/warewulf/bootstrap/aarch64/8/
total 45092
-rw-r--r--. 1 root root       32 Aug 29 12:28 cookie
-rw-r--r--. 1 root root 29170583 Aug 29 12:28 initfs.gz
-rw-r--r--. 1 root root 16996864 Aug 29 12:28 kernel
# cat /var/warewulf/ipxe/cfg/a0\:8c\:f8\:62\:5a\:99 
#!ipxe
# Configuration for Warewulf node: node1
# Warewulf data store ID: 15
echo Now booting node1 with Warewulf bootstrap (4.12.0-1.1.aarch64)
set base http://192.168.10.1/WW/bootstrap
initrd ${base}/aarch64/8/initfs.gz
kernel ${base}/aarch64/8/kernel ro initrd=initfs.gz wwhostname=node1 net.ifnames=0 biosdevname=0 quiet wwmaster=192.168.10.1 wwipaddr=192.168.10.10 wwnetmask=255.255.255.0 wwnetdev=eth0 wwhwaddr=a0:8c:f8:62:5a:99 
boot
# file /var/lib/tftpboot/warewulf/ipxe/bin-arm64-efi/snp.efi 
/var/lib/tftpboot/warewulf/ipxe/bin-arm64-efi/snp.efi: MS-DOS executable

I'll start fresh and update this ticket.

Thanks!

@bensallen

This comment has been minimized.

Show comment
Hide comment
@bensallen

bensallen Aug 29, 2017

Member

Bah sorry misspoke, iPXE binaries are under the tftpboot path. iPXE configs and bootstraps are under /var/warewulf as you pointed out.

It's worth doing a sanity check

  • curl via TFTP the iPXE binary using the path you expect
  • curl the http URL to the iPXE cfg and bootstrap files
  • ensure httpd, dhcpd, tftp-server.socket (or xinetd) are all enabled and running
  • ensure firewall rules are set or disabled
  • if Selinux is enforcing see the new warewulf-server-provision-selinux rpm to setup rules for HTTPD to access /var/warewulf/{ipxe,bootstrap}
  • always good to run 'wwsh dhcp update' and 'wash pxe update' to ensure dhcpd.conf and ipxe cfgs are updated

If all this fails, can you post a tcpdump of the dhcp, tftp, and http traffic during an attempted boot?

Member

bensallen commented Aug 29, 2017

Bah sorry misspoke, iPXE binaries are under the tftpboot path. iPXE configs and bootstraps are under /var/warewulf as you pointed out.

It's worth doing a sanity check

  • curl via TFTP the iPXE binary using the path you expect
  • curl the http URL to the iPXE cfg and bootstrap files
  • ensure httpd, dhcpd, tftp-server.socket (or xinetd) are all enabled and running
  • ensure firewall rules are set or disabled
  • if Selinux is enforcing see the new warewulf-server-provision-selinux rpm to setup rules for HTTPD to access /var/warewulf/{ipxe,bootstrap}
  • always good to run 'wwsh dhcp update' and 'wash pxe update' to ensure dhcpd.conf and ipxe cfgs are updated

If all this fails, can you post a tcpdump of the dhcp, tftp, and http traffic during an attempted boot?

@rengolin

This comment has been minimized.

Show comment
Hide comment
@rengolin

rengolin Aug 29, 2017

Contributor

Bingo!

It seems wwinit all fixes almost all problems. EFI loads, configs try to work, etc. but I get 403 on the httpd, even when using wget.

As you suggested, SELinux was the problem and disabling it worked!

Thanks!! node1 is live! :)

Contributor

rengolin commented Aug 29, 2017

Bingo!

It seems wwinit all fixes almost all problems. EFI loads, configs try to work, etc. but I get 403 on the httpd, even when using wget.

As you suggested, SELinux was the problem and disabling it worked!

Thanks!! node1 is live! :)

@rengolin rengolin closed this Aug 29, 2017

@bensallen

This comment has been minimized.

Show comment
Hide comment
@bensallen

bensallen Aug 29, 2017

Member

Excellent. If you want httpd to work with selinux enforcing you can run:

semanage fcontext -a -t httpd_sys_content_t '/var/warewulf/ipxe(/.*)?' 
semanage fcontext -a -t httpd_sys_content_t '/var/warewulf/bootstrap(/.*)?'
restorecon -Rv /var/warewulf/bootstrap
restorecon -Rv /var/warewulf/ipxe

The new warewulf-provision-server-selinux rpm does this on install via post-script.

Member

bensallen commented Aug 29, 2017

Excellent. If you want httpd to work with selinux enforcing you can run:

semanage fcontext -a -t httpd_sys_content_t '/var/warewulf/ipxe(/.*)?' 
semanage fcontext -a -t httpd_sys_content_t '/var/warewulf/bootstrap(/.*)?'
restorecon -Rv /var/warewulf/bootstrap
restorecon -Rv /var/warewulf/ipxe

The new warewulf-provision-server-selinux rpm does this on install via post-script.

@rengolin

This comment has been minimized.

Show comment
Hide comment
@rengolin

rengolin Aug 29, 2017

Contributor

Thanks Ben! We probably could add that package in the OpenHPC recipe, or even better, make it as a dependency for warewulf-provision-ohpc package, so that we always run it upon installation.

Adding to the recipe will create the problem of having to update once warewulf is updated (and docs become wrong), so the latter is definitely a better alternative, if we can avoid any conflict (run before creating the directories, if that matters).

Contributor

rengolin commented Aug 29, 2017

Thanks Ben! We probably could add that package in the OpenHPC recipe, or even better, make it as a dependency for warewulf-provision-ohpc package, so that we always run it upon installation.

Adding to the recipe will create the problem of having to update once warewulf is updated (and docs become wrong), so the latter is definitely a better alternative, if we can avoid any conflict (run before creating the directories, if that matters).

@bensallen

This comment has been minimized.

Show comment
Hide comment
@bensallen

bensallen Aug 29, 2017

Member

Alternatively I can move the above into the postscripts of the parent warewulf-provision-server RPM, but this will add policycoreutils-python as a dependency for all installs. I suppose though we should be encouraging secure installs and expect selinux is enforcing. Especially since selinux is indeed enforcing by default in RHEL/CentOS installs.

Member

bensallen commented Aug 29, 2017

Alternatively I can move the above into the postscripts of the parent warewulf-provision-server RPM, but this will add policycoreutils-python as a dependency for all installs. I suppose though we should be encouraging secure installs and expect selinux is enforcing. Especially since selinux is indeed enforcing by default in RHEL/CentOS installs.

@rengolin

This comment has been minimized.

Show comment
Hide comment
@rengolin

rengolin Aug 29, 2017

Contributor

I agree you shouldn't push our own (OpenHPC's) choices into everyone else's installations.

But it would also be good to have a warning / installation message to make sure users are aware of the issues, or they'll be lost if they're not used to SELinux.

Contributor

rengolin commented Aug 29, 2017

I agree you shouldn't push our own (OpenHPC's) choices into everyone else's installations.

But it would also be good to have a warning / installation message to make sure users are aware of the issues, or they'll be lost if they're not used to SELinux.

@bensallen

This comment has been minimized.

Show comment
Hide comment
@bensallen

bensallen Aug 29, 2017

Member

Opened #72 to discuss further.

Member

bensallen commented Aug 29, 2017

Opened #72 to discuss further.

@rengolin

This comment has been minimized.

Show comment
Hide comment
@rengolin

rengolin Aug 31, 2017

Contributor

FYI, warewulf development branch works "out-of-the-box" on AArch64 with OpenHPC all the way to PXE booting compute nodes, adding and synchronising files, etc.

Contributor

rengolin commented Aug 31, 2017

FYI, warewulf development branch works "out-of-the-box" on AArch64 with OpenHPC all the way to PXE booting compute nodes, adding and synchronising files, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment