Error trying restore gpt partition tables



  • I’m using fog 1.4.4 to deploy osx + windows bootcamp and catched some bug (and solution). I’m going to open one ticket each one.
    First one.
    Restoring the image the bahavior was like https://forums.fogproject.org/topic/9272/error-trying-to-restore-gpt-partition-tables. It is a gpt issue, not Osx one.

    I went on deploy debug and, it executes in restorePartitionTablesAndBootLoaders() (edited):

            restoreGRUB "$disk" "$disk_number" "$imagePath" "true"
            sgdisk -gl $tmpMBR $disk >/dev/null 2>&1
            sgdiskexit="$?"
    

    in restoreGRUB():

        dd if=$tmpMBR of=$disk bs=512 count=$count >/dev/null 2>&1
    
    

    and the execution of “sgdisk -gl $tmpMBR $disk >/dev/null 2>&1” fails with error code 2:

    Caution! After loading partitions, the CRC doesn't checkout!
    Warning! Main partition table CRC mismatch! Loaded backup partition table instead of main partition table!
    
    Warning! One o more CRCs don't match! You should repair the disk!
    
    Invalid partition data!
    Information: Loading backup partiton table; will override earlier problems!
    The operation has completed successfully.
    

    And reboots.

    I modified restorePartitionTablesAndBootLoaders():

            restoreGRUB "$disk" "$disk_number" "$imagePath" "true"
            sgdisk -z $disk
            sgdisk -gl $tmpMBR $disk >/dev/null 2>&1
            sgdiskexit="$?"
    

    and it works.


  • Senior Developer

    @sebastian-roth All mbr’s saved are 1 mb when performed with dd and has_grub is true.
    If the layout is GPT, however, it will overwrite d1.mbr with the sgdisk backup.

    has_grub, from what I can tell, is not the problem at all here.

    @Enrico I added the sgdisk -z $disk again after the restoreGRUB occurs now. If you would be so kind as to test.

    The reason the file was doing as it did, we do have a backup of the secondary header of gpt. I didn’t know the sgdisk would fail if there was already a table laid on it though. That’s why the -z wasn’t there. I don’t have any good gpt layouts to test with currently so that’s my oversight.

    I’ve updated the working branch to move the -z appropriately. Rebuilding inits to have the most current changes and uploaded them to FOGProject.

    If you installed RC-10, please cd to the project root and delete the binaries1.5.0-RC-10.zip file. And rerun the installer, you should see the latest changes automatically and hopefully this will be fully fixed for you now.



  • @sebastian-roth
    My osx/windows environment does not use grub! I paused the original script and I can assure that saveGRUB creates d1.mbr and sgdisk overwrites it then.

    d1.mbr  d1.original.uuids  d1p1.img  d1p2.img  d1p3.img  d1p4.img  d1.partitions
    

    However, my grub environment (linux/windows) does not create d1.grub.mbr, but touches d1.has_grub.

    d1.has_grub  d1.mbr  d1p1.img  d1p3.img  d1.partitions
    
        [[ -n $sgdisk && $hasGRUB -eq 1 ]] && mbr="$imagePath/d${disk_number}.grub.mbr" || mbr="$imagePath/d${disk_number}.mbr"
    

    In grub/MBR environment (linux/windows) $sgdisk is null -> d1.mbr
    In no grub/GPT environment (osx/windows) $hasGRUB is 0 -> d1.mbr


  • Developer

    Here is a quick patch proposal:

    --- a/src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh
    +++ b/src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh
    @@ -1708,10 +1708,13 @@ restoreGRUB() {
         [[ -z $imagePath ]] && handleError "No image path passed (${FUNCNAME[0]})\n   Args Passed: $*"
         local tmpMBR=""
         MBRFileName "$imagePath" "$disk_number" "tmpMBR" "$sgdisk"
    -    local count=$(du -B 512 $tmpMBR | awk '{print $1}')
    -    [[ $count -eq 8 ]] && count=1
    -    dd if=$tmpMBR of=$disk bs=512 count=$count >/dev/null 2>&1
    -    runPartprobe "$disk"
    +    local has_grub=$(dd if=$tmpMBR 2>&1 | grep -i 'grub')
    +    if [[ -n $has_grub ]]; then
    +        local count=$(du -B 512 $tmpMBR | awk '{print $1}')
    +        [[ $count -eq 8 ]] && count=1
    +        dd if=$tmpMBR of=$disk bs=512 count=$count >/dev/null 2>&1
    +        runPartprobe "$disk"
    +    fi
     }
     # Waits for enter if system is debug type
     debugPause() {
    

    @Enrico This should also solve your issue without moving the function calls I mentioned earlier.


  • Developer

    @Tom-Elliott To properly fix this I think we should add the same kind of GRUB check (something like has_grub=$(dd if=$disk bs=512 count=1 2>&1 | grep -i 'grub')) to the restoreGRUB function as well. So we never do restore that MBR twice if it’s not GRUB at all.


  • Developer

    @Enrico This part of the code is really tricky. If disk is detected to have GPT saveGRUB uses a different filename (d1.grub.mbr) to store the GRUB MBR if it finds GRUB on the disk. So it does not actually overwrite the d1.mbr file. And the same should be true for restoreGRUB here is where the culprit lies I think. Function restoreGRUB does not check if d1.mbr is actually a dded binary blob or an output generated by sgdisk -b. So either we need to add a check for this or we could be lucky and go with doing restoreGRUB after sgdisk -gl.

    Can you please try this code in your scenario:

    restorePartitionTablesAndBootLoaders() {
    ...
            dots "Restoring Partition Tables (GPT)"
            sgdisk -gl $tmpMBR $disk >/dev/null 2>&1
            sgdiskexit="$?"
            if [[ ! $sgdiskexit -eq 0 ]]; then
                echo "Failed"
                debugPause
                handleError "Error trying to restore GPT partition tables (${FUNCNAME[0]})\n   Args Passed: $*\n    CMD Tried: sgdisk -gl $tmpMBR $disk\n    Exit returned code: $sgdiskexit"
            fi
            restoreGRUB "$disk" "$disk_number" "$imagePath" "true"
            global_gptcheck="yes"
            echo "Done"
    ...
    }
    


  • @sebastian-roth I totally agree! This is the reason why I don’t like to submit a patch…
    I’m going to bore you with a few other details to help clarifying the issue.

    The problem begins in the capture phase in savePartitionTablesAndBootLoaders():

            1)
                dots "Saving Partition Tables (GPT)"
                saveGRUB "$disk" "$disk_number" "$imagePath" "true"
                sgdisk -b "$imagePath/d${disk_number}.mbr" $disk >/dev/null 2>&1
    

    the sgdisk command overwrites the just created saveGRUB file ($imagePath/d${disk_number}.mbr).

    In the deploy phase, restorePartitionTablesAndBootLoaders():

            restoreGRUB "$disk" "$disk_number" "$imagePath" "true"
            sgdisk -gl $tmpMBR $disk >/dev/null 2>&1
    

    restoreGRUB is restoring the sgdisk’s file
    One more thing: restoreGRUB is using dd on first sectors and does not restore the backup GPT that is at the end of disk.
    So, when sgdisk is called to restore data, it says CRC mismatch between the main partition table (restored by restoreGRUB) and the backup one (not restored): error 2.
    I think saveGRUB() and restoreGRUB() are saving and restoring MBR in effect: I can’t see the hasGRUB() usefullness in GPT

    So, my last proposal (for today) is to forgot previous patches and comment out:

    restorePartitionTablesAndBootLoaders() {
    ...
        if [[ $table_type == GPT ]]; then
            dots "Restoring Partition Tables (GPT)"
    #        restoreGRUB "$disk" "$disk_number" "$imagePath" "true"
            sgdisk -gl $tmpMBR $disk >/dev/null 2>&1
    

    and

    savePartitionTablesAndBootLoaders() {
    ...
        case $hasgpt in
            0)
                strdots="Saving Partition Tables (MBR)"
    ...
                dots "$strdots"
    #            saveGRUB "$disk" "$disk_number" "$imagePath"
                sfdisk -d $disk 2>/dev/null > $sfdiskfilename
    
    

    I tested capture/deploy of my windows/linux (MBR) and osx/windows (GPT) with no issues.


  • Developer

    @Tom-Elliott The issue described here is exactly the same as in the thread on not being able to restore images from an old FOG server. Back then I thought this was actually an issue with the old image but seeing @Enrico’s issue now I am wondering if this is a more general thing and I think we need to dig a little deeper here. Adding the sgdisk -z might mask the issue but might cause others in the future. I will get into this soon. Keeping you guys posted on this.


  • Senior Developer

    I’ve added the sgdisk -z statement before the dd occurs. I have not done any math in regards to the “if count = 40 set count = 1” as normally the sizes are 63, or 8. Because the device truly believes it IS a GPT disk, chances are likely the “gpt table corrupt” is 100% true. This is unsurprising and a common side effect of Windows installations, particularly AFTER the “first OS” is installed.


  • Senior Developer

    I’m not sure I follow.

    Can you provide a patch of the exact changes you made please?

    This can be done by copying your new file somewhere relatively local, let’s just say in opt.

    Run:

    diff -u /opt/fogproject/src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh /opt/funcs.sh
    

    Post those changes here.

    I’m only confused because later you say that modifying the restoreGrub function to set count=40 to count=1 also works?



  • Sorry for the misunderstanding! I’m not a developer. Thanks for pointing out my mistake. I’m going to edit the first snippets removing

    sgdisk -z $disk
    

    (my single patch)
    I reported one solution but I’m not sure that it has no side effects because I’m not familiar with the code.
    Do you think I had to push a request on git? I’m not familiar but I can learn.
    About https://forums.fogproject.org/topic/10998/problem-of-deployment-of-an-old-image ticket:
    I don’t think it is related. My image is captured using the same 1.4.4 as the one in the deploy phase.
    In my case, the problem is like https://github.com/ceph/ceph-ansible/issues/759
    My bootloader seems hybrid MBR and I don’t understand why $count parameter of dd is 40 (MBR is first 512 bytes, I think).
    I debugged deeper and I see FOG makes an

        dots "Erasing current MBR/GPT Tables";
        sgdisk -Z $disk >/dev/null;
    

    then

    dd if=$tmpMBR of=$disk bs=512 count=$count >/dev/null 2>&1
    

    with $count=40.
    If I fire an gdisk -l $disk
    the output says

    MBR: hybrid
    ...
    GPT: damaged
    

    Firing by hand instead

    dd if=$tmpMBR of=$disk bs=512 count=1
    

    gdisk -l $disk outputs:

    MBR: hybrid
    ...
    GPT: not present
    

    then I can send sgdisk -gl $tmpMBR $disk >/dev/null 2>&1 without errors.
    So, a better patch could be setting $count=1 before dd?

         [[ $count -eq 8 ]] && count=1
    # patch?
        [[ $count -eq 40 ]] && count=1
        dd if=$tmpMBR of=$disk bs=512 count=$count >/dev/null 2>&1
    

    It is beyond my FOG code knowdlege.
    Let me know.
    Thanks!


  • Developer

    @enrico We appreciate people helping us in any kind of way. So thank you very much for your posts.

        restoreGRUB "$disk" "$disk_number" "$imagePath" "true"
        sgdisk -z $disk
        sgdisk -gl $tmpMBR $disk >/dev/null 2>&1
        sgdiskexit="$?"
    

    Both script code snippets you posted are exactly the same and I can’t figure out which change you made to make it work. Please let us know.

    The behavior described is exactly the same as I just saw here: https://forums.fogproject.org/topic/10998/problem-of-deployment-of-an-old-image
    In this case the image to be deployed came from an older FOG version and caused the issue. Is this true in your case as well? Was this image captured using an older FOG version?


  • Moderator

    @enrico Are you asking for help or just reporting a solution? Are you familiar with pull requests?


Log in to reply
 

855
Online

39.3k
Users

11.0k
Topics

104.4k
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.