Error trying restore gpt partition tables
-
I’m using fog 1.4.4 to deploy osx + windows bootcamp and catched some bug (and solution). I’m going to open one ticket each one.
First one.
Restoring the image the bahavior was like https://forums.fogproject.org/topic/9272/error-trying-to-restore-gpt-partition-tables. It is a gpt issue, not Osx one.I went on deploy debug and, it executes in restorePartitionTablesAndBootLoaders() (edited):
restoreGRUB "$disk" "$disk_number" "$imagePath" "true" sgdisk -gl $tmpMBR $disk >/dev/null 2>&1 sgdiskexit="$?"
in restoreGRUB():
dd if=$tmpMBR of=$disk bs=512 count=$count >/dev/null 2>&1
and the execution of “sgdisk -gl $tmpMBR $disk >/dev/null 2>&1” fails with error code 2:
Caution! After loading partitions, the CRC doesn't checkout! Warning! Main partition table CRC mismatch! Loaded backup partition table instead of main partition table! Warning! One o more CRCs don't match! You should repair the disk! Invalid partition data! Information: Loading backup partiton table; will override earlier problems! The operation has completed successfully.
And reboots.
I modified restorePartitionTablesAndBootLoaders():
restoreGRUB "$disk" "$disk_number" "$imagePath" "true" sgdisk -z $disk sgdisk -gl $tmpMBR $disk >/dev/null 2>&1 sgdiskexit="$?"
and it works.
-
@enrico Are you asking for help or just reporting a solution? Are you familiar with pull requests?
-
@enrico We appreciate people helping us in any kind of way. So thank you very much for your posts.
restoreGRUB "$disk" "$disk_number" "$imagePath" "true" sgdisk -z $disk sgdisk -gl $tmpMBR $disk >/dev/null 2>&1 sgdiskexit="$?"
Both script code snippets you posted are exactly the same and I can’t figure out which change you made to make it work. Please let us know.
The behavior described is exactly the same as I just saw here: https://forums.fogproject.org/topic/10998/problem-of-deployment-of-an-old-image
In this case the image to be deployed came from an older FOG version and caused the issue. Is this true in your case as well? Was this image captured using an older FOG version? -
Sorry for the misunderstanding! I’m not a developer. Thanks for pointing out my mistake. I’m going to edit the first snippets removing
sgdisk -z $disk
(my single patch)
I reported one solution but I’m not sure that it has no side effects because I’m not familiar with the code.
Do you think I had to push a request on git? I’m not familiar but I can learn.
About https://forums.fogproject.org/topic/10998/problem-of-deployment-of-an-old-image ticket:
I don’t think it is related. My image is captured using the same 1.4.4 as the one in the deploy phase.
In my case, the problem is like https://github.com/ceph/ceph-ansible/issues/759
My bootloader seems hybrid MBR and I don’t understand why $count parameter of dd is 40 (MBR is first 512 bytes, I think).
I debugged deeper and I see FOG makes andots "Erasing current MBR/GPT Tables"; sgdisk -Z $disk >/dev/null;
then
dd if=$tmpMBR of=$disk bs=512 count=$count >/dev/null 2>&1
with $count=40.
If I fire an gdisk -l $disk
the output saysMBR: hybrid ... GPT: damaged
Firing by hand instead
dd if=$tmpMBR of=$disk bs=512 count=1
gdisk -l $disk outputs:
MBR: hybrid ... GPT: not present
then I can send sgdisk -gl $tmpMBR $disk >/dev/null 2>&1 without errors.
So, a better patch could be setting $count=1 before dd?[[ $count -eq 8 ]] && count=1 # patch? [[ $count -eq 40 ]] && count=1 dd if=$tmpMBR of=$disk bs=512 count=$count >/dev/null 2>&1
It is beyond my FOG code knowdlege.
Let me know.
Thanks! -
I’m not sure I follow.
Can you provide a patch of the exact changes you made please?
This can be done by copying your new file somewhere relatively local, let’s just say in opt.
Run:
diff -u /opt/fogproject/src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh /opt/funcs.sh
Post those changes here.
I’m only confused because later you say that modifying the restoreGrub function to set count=40 to count=1 also works?
-
I’ve added the sgdisk -z statement before the dd occurs. I have not done any math in regards to the “if count = 40 set count = 1” as normally the sizes are 63, or 8. Because the device truly believes it IS a GPT disk, chances are likely the “gpt table corrupt” is 100% true. This is unsurprising and a common side effect of Windows installations, particularly AFTER the “first OS” is installed.
-
@Tom-Elliott The issue described here is exactly the same as in the thread on not being able to restore images from an old FOG server. Back then I thought this was actually an issue with the old image but seeing @Enrico’s issue now I am wondering if this is a more general thing and I think we need to dig a little deeper here. Adding the
sgdisk -z
might mask the issue but might cause others in the future. I will get into this soon. Keeping you guys posted on this. -
@sebastian-roth I totally agree! This is the reason why I don’t like to submit a patch…
I’m going to bore you with a few other details to help clarifying the issue.The problem begins in the capture phase in savePartitionTablesAndBootLoaders():
1) dots "Saving Partition Tables (GPT)" saveGRUB "$disk" "$disk_number" "$imagePath" "true" sgdisk -b "$imagePath/d${disk_number}.mbr" $disk >/dev/null 2>&1
the sgdisk command overwrites the just created saveGRUB file ($imagePath/d${disk_number}.mbr).
In the deploy phase, restorePartitionTablesAndBootLoaders():
restoreGRUB "$disk" "$disk_number" "$imagePath" "true" sgdisk -gl $tmpMBR $disk >/dev/null 2>&1
restoreGRUB is restoring the sgdisk’s file
One more thing: restoreGRUB is using dd on first sectors and does not restore the backup GPT that is at the end of disk.
So, when sgdisk is called to restore data, it says CRC mismatch between the main partition table (restored by restoreGRUB) and the backup one (not restored): error 2.
I think saveGRUB() and restoreGRUB() are saving and restoring MBR in effect: I can’t see the hasGRUB() usefullness in GPTSo, my last proposal (for today) is to forgot previous patches and comment out:
restorePartitionTablesAndBootLoaders() { ... if [[ $table_type == GPT ]]; then dots "Restoring Partition Tables (GPT)" # restoreGRUB "$disk" "$disk_number" "$imagePath" "true" sgdisk -gl $tmpMBR $disk >/dev/null 2>&1
and
savePartitionTablesAndBootLoaders() { ... case $hasgpt in 0) strdots="Saving Partition Tables (MBR)" ... dots "$strdots" # saveGRUB "$disk" "$disk_number" "$imagePath" sfdisk -d $disk 2>/dev/null > $sfdiskfilename
I tested capture/deploy of my windows/linux (MBR) and osx/windows (GPT) with no issues.
-
@Enrico This part of the code is really tricky. If disk is detected to have GPT
saveGRUB
uses a different filename (d1.grub.mbr) to store the GRUB MBR if it finds GRUB on the disk. So it does not actually overwrite thed1.mbr
file. And the same should be true forrestoreGRUB
here is where the culprit lies I think. FunctionrestoreGRUB
does not check ifd1.mbr
is actually add
ed binary blob or an output generated bysgdisk -b
. So either we need to add a check for this or we could be lucky and go with doingrestoreGRUB
aftersgdisk -gl
.Can you please try this code in your scenario:
restorePartitionTablesAndBootLoaders() { ... dots "Restoring Partition Tables (GPT)" sgdisk -gl $tmpMBR $disk >/dev/null 2>&1 sgdiskexit="$?" if [[ ! $sgdiskexit -eq 0 ]]; then echo "Failed" debugPause handleError "Error trying to restore GPT partition tables (${FUNCNAME[0]})\n Args Passed: $*\n CMD Tried: sgdisk -gl $tmpMBR $disk\n Exit returned code: $sgdiskexit" fi restoreGRUB "$disk" "$disk_number" "$imagePath" "true" global_gptcheck="yes" echo "Done" ... }
-
@Tom-Elliott To properly fix this I think we should add the same kind of GRUB check (something like
has_grub=$(dd if=$disk bs=512 count=1 2>&1 | grep -i 'grub')
) to therestoreGRUB
function as well. So we never do restore that MBR twice if it’s not GRUB at all. -
Here is a quick patch proposal:
--- a/src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh +++ b/src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh @@ -1708,10 +1708,13 @@ restoreGRUB() { [[ -z $imagePath ]] && handleError "No image path passed (${FUNCNAME[0]})\n Args Passed: $*" local tmpMBR="" MBRFileName "$imagePath" "$disk_number" "tmpMBR" "$sgdisk" - local count=$(du -B 512 $tmpMBR | awk '{print $1}') - [[ $count -eq 8 ]] && count=1 - dd if=$tmpMBR of=$disk bs=512 count=$count >/dev/null 2>&1 - runPartprobe "$disk" + local has_grub=$(dd if=$tmpMBR 2>&1 | grep -i 'grub') + if [[ -n $has_grub ]]; then + local count=$(du -B 512 $tmpMBR | awk '{print $1}') + [[ $count -eq 8 ]] && count=1 + dd if=$tmpMBR of=$disk bs=512 count=$count >/dev/null 2>&1 + runPartprobe "$disk" + fi } # Waits for enter if system is debug type debugPause() {
@Enrico This should also solve your issue without moving the function calls I mentioned earlier.
-
@sebastian-roth
My osx/windows environment does not use grub! I paused the original script and I can assure that saveGRUB creates d1.mbr and sgdisk overwrites it then.d1.mbr d1.original.uuids d1p1.img d1p2.img d1p3.img d1p4.img d1.partitions
However, my grub environment (linux/windows) does not create d1.grub.mbr, but touches d1.has_grub.
d1.has_grub d1.mbr d1p1.img d1p3.img d1.partitions
[[ -n $sgdisk && $hasGRUB -eq 1 ]] && mbr="$imagePath/d${disk_number}.grub.mbr" || mbr="$imagePath/d${disk_number}.mbr"
In grub/MBR environment (linux/windows) $sgdisk is null -> d1.mbr
In no grub/GPT environment (osx/windows) $hasGRUB is 0 -> d1.mbr -
@sebastian-roth All mbr’s saved are 1 mb when performed with dd and has_grub is true.
If the layout is GPT, however, it will overwrite d1.mbr with the sgdisk backup.has_grub, from what I can tell, is not the problem at all here.
@Enrico I added the
sgdisk -z $disk
again after the restoreGRUB occurs now. If you would be so kind as to test.The reason the file was doing as it did, we do have a backup of the secondary header of gpt. I didn’t know the sgdisk would fail if there was already a table laid on it though. That’s why the -z wasn’t there. I don’t have any good gpt layouts to test with currently so that’s my oversight.
I’ve updated the working branch to move the -z appropriately. Rebuilding inits to have the most current changes and uploaded them to FOGProject.
If you installed RC-10, please cd to the project root and delete the binaries1.5.0-RC-10.zip file. And rerun the installer, you should see the latest changes automatically and hopefully this will be fully fixed for you now.
-
@tom-elliott Ok, I confirm it is working. Thanks!