Skip to content

Kdump_for_Linux_diskless_nodes

ligc edited this page Jul 30, 2015 · 8 revisions

Table of Contents

{{:Design Warning}}

Internal Code Changes of xCAT

The following sections are for the internal code changes.

Schema.pm

Put the dump attribute to the linuximage schema. The user can use the chdef command to set/change the dump attribute for the image.

"genimage" command

Disable the kdump service by default.

 chroot $rootimg_dir chkconfig kdump off

Create one fake command (fsck.nfs) which always return true, if "fsck.nfs" doesn't exist in the root image.

anaconda.pm / sles.pm

Update code for

 nodeset <noderange> osimage=<osimagename>

If the dump attribute is set for the corresponding image, then put the kernel parameter

 crashkernel=128M@32M

to the boot config file. For the platforms using "yaboot", the config file is

 /tftpboot/etc/&lt;nodename&gt;

, and then append another kernel parameter

 dump=&lt;dump value&gt;

Postscript enablekdump

When the node is booting up, The enablekdump postscipt is used to start the kdump service; for RHEL6, it also do some workaround to generate the initial ramdisk for kdump. In the enablekdump postscript, /proc/kcmdline will be parsed, if dump= is found, its value will be parsed, and update the "/etc/kdump.conf" file. After the /etc/kdump.conf file is updated, the kdump service should be started by calling the command:

 /etc/init.d/kdump start

For SLES11, it alse need workaround to generate the inital ramdisk for kdump. In the enablekdump postscript, /proc/kcmdline will be parsed, if dump= is found, its value will be parsed, and update the "/etc/sysconfig/kdump" file. After the "/etc/sysconfig/kdump" file is updated, the kdump service should be started by calling the command:

 /etc/init.d/boot.kdumpstart

Workaround for RHEL6

Before kdump service is started the NFS directory is mounted to the /var/tmp which is used as a temp directory for the mkdumprd command to generate the intial ramdisk for kdump. The NFS directory is read-writeable. The $xcatmaster:/install/kdump/tmp will be created when the xCAT package is installed, since the /install directory is exported by default, the $xcatmaster:/install/kdump/tmp directory is read-writeable, too. After the kdump service is started successfully, this NFS directory will be umounted from the /var/tmp directory, so this workaround won't affect the running of the node.

For rhels6.1 the kdump service needs /tmp instead of /var/tmp for this workaround.

The link_delay = 180 is added to the /etc/kdump.conf in the enablekdump postscript. Some network cards take a long time to initialize, and some spanning tree enabled networks do not transmit user traffic for long periods after a link state changes. This optional parameter defines a wait period after a link is activated in which the initramfs will wait before attempting to transmit user data.

Workaround for SLES11

On SLES the boot.kdump service is configured via /etc/sysconfig/kdump file. The boot.kdump under /etc/init.d will call mkdumprd -K "$kdump_kernel" -I "$kdump_initrd" -q to create the initrd(call it kdumpinit) which will be used by the kdump. The mkdumprd will call /sbin/mkinitrd to create the kdumpinit. (the mkinitrd only work for diskfull install, it did not consider the diskless install scenario). The /sbin/mkinitrd runs all of the shell script under /lib/mkinitrd/setup to generate the kdumpinit(will pack all scripts under /lib/mkinitrd/boot into the kdumpinit). To simulate a crash do:

  echo 1 &gt; /proc/sys/kernel/sysrq ; echo c &gt; /proc/sysrq-trigger

The kdumpinit generated by /sbin/mkinitrd contains all shell scripts under /lib/mkinitrd/boot. All these scripts will be found in the init. There are two special scripts 83-mount.sh and 84-remount.sh. 83-mount.sh is used to mount and check the root device, 84-remount.sh is used to mount the root file system and run the init under the root file system instead of the normal init binary. This is the reason of this problem. For a diskless install server, the root file system is tmpfs and there is no corresponding device, so the hanging error will appear when running 83-mount.sh. If dumping to a remote server, the root file system is useless, only initrd is enough. There is no need to pack these two scripts into the initrd. The around is change these two script names to avoid packing into the initrd. When the initrd created the names are changed back. There is no root device discovering and checking progress so the script 91-kdump.sh can run correctly and the dump is successful.

Questions

For hirarchical diskless environment, the /install directory of the Service Node is mounted from the Management Node. When the node is starting up, the $xcatmaster:/install/kdump/tmp directory cannot be mounted because NFS denies re-mount action. How can we do for such a scenario?

Source Files involved

 xCAT/xCAT.spec 
 perl-xCAT/xCAT/Schema.pm 
 xCAT-server/share/xcat/netboot/rh/genimage 
 xCAT-server/share/xcat/netboot/add-on/statelite/rc.statelite 
 xCAT-server/lib/xcat/plugins/anaconda.pm 
 xCAT-server/lib/xcat/plugins/sles.pm 
 xCAT/postscripts/enablekdump 

Other Design Considerations

  • Required reviewers: Bruce Potter
  • Required approvers: Bruce Potter
  • Database schema changes: N/A
  • Affect on other components: N/A
  • External interface changes, documentation, and usability issues: N/A
  • Packaging, installation, dependencies: N/A
  • Portability and platforms (HW/SW) supported: N/A
  • Performance and scaling considerations: N/A
  • Migration and coexistence: N/A
  • Serviceability: N/A
  • Security: N/A
  • NLS and accessibility: N/A
  • Invention protection: N/A

News

History

  • Oct 22, 2010: xCAT 2.5 released.
  • Apr 30, 2010: xCAT 2.4 is released.
  • Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
  • Apr 16, 2009: xCAT 2.2 released.
  • Oct 31, 2008: xCAT 2.1 released.
  • Sep 12, 2008: Support for xCAT 2 can now be purchased!
  • June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
  • May 30, 2008: xCAT 2.0 for Linux officially released!
  • Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
  • Oct 31, 1999: xCAT 1.0 is born!
    xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.
Clone this wiki locally