Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed installation on top of previous soft RAID Linux install #543

Open
AtaxyaNetwork opened this issue Mar 23, 2022 · 4 comments
Open

Comments

@AtaxyaNetwork
Copy link

Hello !

I regularly reinstall machine which previously run Linux (Debian 10 mostly) with soft raid 1 to XCP-ng.
Since 8.2 (And I think it's older than that), when I recreate the soft raid 1 from the installer, the installer finishes correctly, but I end up in grub rescue at the reboot.
My guess is that XCP-ng installer don't delete the old soft raid correctly, and the grub get confused. I try to boot via grub rescue, but with no success.
My workaround is to boot a live Debian, launch the shell, and execute this for each disk I want to use in my soft raid:

DISK=sdx
LBAS=$(cat /sys/block/$DISK/size)
dd if=/dev/zero of=/dev/$DISK bs=512 count=1024
dd if=/dev/zero of=/dev/$DISK bs=512 seek=$(($LBAS-1024)) count=1024
mdadm --zero-superblock /dev/$DISK
sync

Then I can relaunch the installer, and XCP-ng install successfully !

Let me know if I can help !

Cécile

@stormi
Copy link
Member

stormi commented Mar 23, 2022

Thanks for the report. It was known that creating a soft RAID may fail on previously used disks due to stale metadata, but not that it may succeed and then fail only at grub install stage.

Do you see what the error is in the installer logs (/tmp/install-log from the installer before rebooting, or /var/log/installer/install-log from the installed system that doesn't boot)?

@stormi
Copy link
Member

stormi commented Mar 23, 2022

Related to #107

@AtaxyaNetwork
Copy link
Author

Hello !

Unfortunately, I didn't keep the logs, since I need the machine urgently.
I will try to set up a test machine to reproduce this bug ASAP :)

@AtaxyaNetwork
Copy link
Author

Hello !

I found the time to test the installation of XCP-ng on top of a Debian (11.3) soft raid 1
I tried the process on one of my lab machine (Dell R610 with 2 146G HDD) and a VM with two 80G disk.
I have the same behavior on both machines.
I attach the log of the VM one.
installer.log

I did this to test raid soft:

  • Install a Debian soft raid1, with 1 / ext4 partition
  • Boot the Debian and make sure grub and raid work well
  • Then install XCP-ng (8.2.1)
  • On the disk selector, i have this:
    xvda
    xvdb
    md0
    I tried to recreate the raid with the software raid panel, but the md0 raid was again present and i don't have a md127 as usual. So I selected md0, and continue my install (nothing special, I just select ext instead of lvm)
    As expected, once the installation is finish, when the server reboot, I arrive at the grub rescue.

I think the best workaround is to allow on the installer to delete old soft raid, using the command I provided in my first message.

I can provide you access to my lab machine and/or the VM I use to test, if you want to dig directly.

Thanks again for looking into that, and sorry for the delay !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants