Centos 7 UUID not updated during imaging - will not boot


  • SERVER
    FOG Version: 1.5.9
    OS: centos 7.9

    CLIENT
    OS: centos 7.9

    DESCRIPTION
    When we image our centos 7.9 system, which is using UEFI boot and UUID in /etc/sftab, the pc being imaged shows a kernel panic and displays that the UUID for “/dev/disk/by-uuid/(uuid here) does not exist”. Does FOG not change the UUID to match the new system when it finishes imaging? I found an article from 2017 that shows this issue was fixed for a swap UUID.

  • Senior Developer

    @gerrit-anderson said in Centos 7 UUID not updated during imaging - will not boot:

    From what I can tell, FOG is handling the UUID’s correctly…

    Would say so too from what we discovered so far. It’s interesting you can capture/deploy from/to VM<->VM and machine<->machine but not “across”.

    For further debugging I suggest to dig into the dracut emergency shell/mode more. First run ls -al /dev/disk/by-uuid/ on the dracut command prompt and post a picture of that here. As well you might also follow the instrcutions printed, run journalctl and look through the log for hints on why it fails booting. Also take a look at the file /run/initramfs/rdsosreport.txt mentioned. Feel free to share the file here and I’ll take a look as well.

    Edit:
    Searching the web I found this: https://unix.stackexchange.com/questions/183859/initramfs-uuid-problems-after-cloning

    Sounds like the initramfs might be generated differently on your VM and bare metal install. Both are missing a driver for the other one. If that turns out to be true you need to find out which one is missing and manually regenerate initramfs with dracut e.g. on your VM before capturing the image to be deployed to hardware.


  • @sebastian-roth Looks like the mystery continues… Below are results!

    Creating the master image on a physical machine allows me to sent to any other physical machine, doesn’t need to be like models. Sending the master image built on the physical machine to a HyperV VM causes the same issue as from a HyperV master image to a physical machine. The message is below, warning that the UUID doesn’t exist and cannot boot.

    Pimage_no_boot.png

    When running debug deploy tasks, below are the results for the physical machine and virtual machine UUID’s respectively. They look identical… This makes me wonder why CentOS doesnt think the disk UUID exists…

    Physical Machine
    Pimage_Pmachine.png

    Virtual Machine
    Pimage_Vmachine.png

    The physical machine boots fine, virtual machine tries to boot and then goes to emergency mode.

    From what I can tell, FOG is handling the UUID’s correctly…

  • Senior Developer

    @Gerrit-Anderson I have not looked into the UUID stuff in a long time, obviously. The /dev/disk/by-uuid/... and /etc/fstab are both using the filesystem UUID. What we capture in d1.partitions are partition UUIDs - which are different to FS UUIDs!

    So we need to look at partclone (used to clone the actual filesystem data) to see if it’s messing up the filesystem UUIDs or not. In general partclone is meant to clone everything including the filesystem UUID. So it really shouldn’t mess with it as far as I know (ref).

    Please schedule a debug deploy task on a physical machine (as you said this doesn’t happen when deploying to a VM). PXE boot that machine and when you get to the command shell start the deployment using the command fog. You step through the whole process by pressing ENTER every so often and when it’s all done you will get back to a command shell. Here I need you to run the command blkid, take a picture and post that here.

    If you are keen, do the same debug deploy but on a VM. Step though the process and run blkid in the end.


  • @sebastian-roth Here is my fstab

    centos4.png

  • Senior Developer

    @Gerrit-Anderson Please tell us which UUID do you see in the CentOS /etc/fstab for this master VM machine?


  • @sebastian-roth Results of blkid -po udev /dev/sda5 are below!

    centos3.png

  • Senior Developer

    @Gerrit-Anderson Ok, those seem to match the ones we see in d1.patitions so it’s not something that is being messed up when capturing the UUIDs but looks like the IDs read/set by sfdisk are not the ones used by CentOS.

    While in the FOS debug command mode, run blkid -po udev /dev/sda5, take a picture and post that here.


  • @sebastian-roth sfdisk version is 2.35.1 Below is output of sfdisk -d /dev/sda

    centos2.png

  • Senior Developer

    @gerrit-anderson said in Centos 7 UUID not updated during imaging - will not boot:

    The sfdisk that I am using must be what ships with CentOS 7… I also ran this command on my FOG server, and its the same version, but that is also running CentOS 7.

    Ahhh, now we are talking!! What I am asking you to do is, schedule a debug capture task and PXE boot the machine into it. Then you get to a command shell and run the sfdisk -d /dev/sda command there.


  • @sebastian-roth Ahhh, no. I will do that now. I did not do this in the FOG debug shell. This was all done within CentOS. I apologize… Doing this now!


  • @sebastian-roth I may not be understanding this fully, but I ran this command on my master image. That shouldn’t have any impact on anything related to FOG right? The sfdisk that I am using must be what ships with CentOS 7… I also ran this command on my FOG server, and its the same version, but that is also running CentOS 7.

  • Senior Developer

    @Gerrit-Anderson Just checked on the official FOS init used with FOG 1.5.9. Version of sfdisk/util-linux is 2.35.1. No idea where you got yours from.

    Sure you run this in the FOG debug command shell? Scheduling a debug capture task and PXE boot the machine into it?!

  • Senior Developer

    @gerrit-anderson said in Centos 7 UUID not updated during imaging - will not boot:

    The sfdisk version is sfdisk from util-linux 2.23.2

    Hmm, this version is really old. We are at 2.35.1 at the moment and 2.23.x was used in buildroot back in 2013. I don’t think we had our FOS image build with such an old version at any point really. For GPT support you need 2.26.x at least I read in the man page. I have no idea how you can have a FOS init with this old versioned sfdisk command.

    I suggest you download the latest inits from our server and try with those:

    sudo -i
    cd /var/www/fog/service/ipxe/
    mv init.xz init.xz.old
    mv init_32.xz init_32.xz.old
    wget https://fogproject.org/inits/init.xz
    wget https://fogproject.org/inits/init_32.xz
    chown fogproject:apache init*
    

  • @sebastian-roth The sfdisk version is sfdisk from util-linux 2.23.2

    Yes, FOG 1.5.9 most recent download, CentOS 7.9 (latest version 7)

  • Senior Developer

    @gerrit-anderson said in Centos 7 UUID not updated during imaging - will not boot:

    These were ran in the debug shell.

    Can’t get my head around this. You say your FOG server is 1.5.9. Do you use a custom FOS init?

    Please run sfdisk --version in the debug shell and post version here.


  • @sebastian-roth So it looks like sfdisk may not support gpt? I ran a couple more commands, not sure if this shows exactly what you are looking for… These were ran in the debug shell.

    centos.png

  • Senior Developer

    @gerrit-anderson said in Centos 7 UUID not updated during imaging - will not boot:

    /dev/sda1 : start= 1, size=209715199, Id=ee

    Looks like the CentOS 7 sfdisk command is not able to read GPT partition layout. This is just the protective MBR entry.

    Can you please schedule a debug capture task for the master, boot it up and when you get to the shell run sfdisk -d /dev/sda again. Take a picture and post here.


  • @sebastian-roth To add more troubleshooting steps:

    Master Image created on HyperV as a Gen 2 (UEFI) VM:
    HyperV to HyperV works fine, even to different hosts
    HyperV to AMD PC (UEFI) doesn’t boot, missing disk message
    HyperV to Intel PC (UEFI) doesn’t boot, missing disk message

    Master Image created on AMD PC (UEFI):
    AMD PC to AMD PC works fine
    AMD PC to Intel PC (UEFI) works fine
    AMD PC to HyperV doesn’t boot, missing disk message

    It seems to be an issue with the virtual machine from what I can tell… For the time being, I will keep two images, one for VM’s and one for physical machines… I would like to have an end result of one image working on HyperV and Intel/AMD based desktops. If you have any other troubleshooting steps I can try, I am willing to give them a shot.

    I appreciate the help so far!


  • @sebastian-roth The output from sfdisk -d /dev/sda is below:

    # partition table of /dev/sda
    unit: sectors
    
    /dev/sda1 : start=        1, size=209715199, Id=ee
    /dev/sda2 : start=        0, size=        0, Id= 0
    /dev/sda3 : start=        0, size=        0, Id= 0
    /dev/sda4 : start=        0, size=        0, Id= 0
    

320
Online

8.2k
Users

15.1k
Topics

141.9k
Posts