"Saving Partitions" - GPT to MBR issue

Wayne Workman

This has been an issue for a while.

When a computer system ships with Windows and a storage disk that is formatted as GPT, and the technician/owner changes the Firmware mode from UEFI to Legacy, resulting in an MBR disk layout when they re-install windows, they must run a FOG debug and do a fixparts correction in order to capture via fog. This is of course after having the “Saving Partitions” portion of the FOG Capture process hang overnight and realizing that isn’t right or acceptable and seeking further help.

I want to completely eliminate this issue.

I think that if FOS takes longer than 60 seconds to save the original partitions, AND the disk is in MBR layout, it’s then appropriate to run fixparts against that disk automatically to correct the issue, and afterwards to try to re-start the “Saving Partitons” phase of capture.

I will be testing code tweaks for this issue, and I’ll post progress here.

Wayne Workman

I’ve emailed the creator of fixparts for advice about scripting this process. I’m waiting for a reply.

If I don’t get a reply, I’ll make an attempt on my own.

Tom Elliott

I’ve already added code that will automatically do the fix parts, it just doesn’t work. You lite should be a little easier.

As for a “timeout” it won’t work because it’s “stuck”

Wayne Workman

@Tom-Elliott said in "Saving Partitions" - GPT to MBR issue:

As for a “timeout” it won’t work because it’s “stuck”

There’s ways to do it. I don’t know the function names, but say the function to save the partitions is called save_partitions, it would look like this in it’s flow:

#Start the saving partitions process, when it is done, create a file indicating it's done, and background this process.
(save_partitions;touch /savingPartitionsDone) &

#Start a loop that looks for this file. If it finds it, exit loop. 
#If it does not find it, increment the counter and sleep for 1 second. 
#If the counter reaches 60, kill the command that is hanging and start fixparts and then exit the loop.

counter=0
while true
do
    if [[ -f /savingPartitionsDone ]]; then
        break
    else
        sleep 1
        counter=$((counter + 1))
        if [[ $counter -gt 60 ]]; then
            #Kill hanging command here.
            #Start fixparts next here.
            #exit the loop
            break
        fi
    fi
done

And looking at this now, it’s probably not necessary to wait 60 seconds. Probably 15 would do fine.

george1421

Be aware I’m just throwing out random stuff here.

What is the impact of running fixparts for every deploy process? Is it safe to run on a healthy disk structure?
Can you detect if the disk is gpt before attempting to save anything to the disk? If so what negative affect would it be for a gpt formatted disk to run fixparts (gratuitously) even if the final disk image will be GPT anyway? I assume the deploy scripts know if the target is suppose to be mbr or gpt before writing to the disk?
Is it safe to assume bash doesn’t have a try catch function like try this command for 20 seconds, if the timeout is reached then abort the try?

Wayne Workman

@george1421 said in "Saving Partitions" - GPT to MBR issue:

Is it safe to assume bash doesn’t have a try catch function like try this command for 20 seconds, if the timeout is reached then abort the try?

It doesn’t, the closest thing is what I wrote below. I can make it more advanced with exit code checking but I’ve not dug that deep into it yet.

Wayne Workman

@george1421 said in "Saving Partitions" - GPT to MBR issue:

What is the impact of running fixparts for every deploy process? Is it safe to run on a healthy disk structure?

That could be the answer. And, we only ever need fixparts if the drive is in MBR/Legacy currently. If it is of no consequence to do it for every legacy / mbr drive, then yes do that.

Wayne Workman

@george1421 said in "Saving Partitions" - GPT to MBR issue:

I assume the deploy scripts know if the target is suppose to be mbr or gpt before writing to the disk?

This is only for the capture process, not deploy.

Tom Elliott

https://github.com/FOGProject/fogproject/blob/dev-branch/src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh#L2094

This is the fixparts function I already created.

Tom Elliott

https://github.com/FOGProject/fogproject/blob/bad33be289a64419607b666b066f2e9887344c43/src/buildroot/package/fog/scripts/usr/share/fog/lib/partition-funcs.sh#L648

This is the function calls the runFixparts function.

Tom Elliott

https://github.com/FOGProject/fogproject/blob/bad33be289a64419607b666b066f2e9887344c43/src/buildroot/package/fog/scripts/bin/fog.upload#L130

Here’s where the saveOriginalPartitions is called.

Tom Elliott

That should cover the “testable” elements. I’m telling you though, this isn’t an easy thing to figure out.

The gpt or mbr parts work perfectly but it get’s stuck if it’s not that.

Tom Elliott

@Wayne-Workman Stuck commands are literally that. The only way to “test” and kill would be to put the task into background, get the started pid. Then do the loop.

The problem with that approach is you need to know which elements and why it’s being caused.

I think it’s simply waiting for some input on “what to do” and we just need to let it through by saying no when it’s stuck.

I haven’t got a good mechanism, though, to reliably detect the “bad gpt-mbr” state.

Tom Elliott

@Wayne-Workman said in "Saving Partitions" - GPT to MBR issue:

@Tom-Elliott said in "Saving Partitions" - GPT to MBR issue:

As for a “timeout” it won’t work because it’s “stuck”

There’s ways to do it. I don’t know the function names, but say the function to save the partitions is called save_partitions, it would look like this in it’s flow:
#Start the saving partitions process, when it is done, create a file indicating it's done, and background this process.
(save_partitions;touch /savingPartitionsDone) &

#Start a loop that looks for this file. If it finds it, exit loop. 
#If it does not find it, increment the counter and sleep for 1 second. 
#If the counter reaches 60, kill the command that is hanging and start fixparts and then exit the loop.

counter=0
while true
do
    if [[-f /savingPartitionsDone]]; then
        break
    else
        sleep 1
        counter=$((counter + 1))
        if [[$counter -gt 60]]; then
            #Kill hanging command here.
            #Start fixparts next here.
            #exit the loop
            break
        fi
    fi
done
And looking at this now, it’s probably not necessary to wait 60 seconds. Probably 15 would do fine.

Your code should look more like: – Within the “checking part” (within the function saveSfdiskPartitions from partition funcs.sh)

local sfdiskcount=0
(sfdisk -d $disk 1>$file) &
local sfpid=$!
sleep 15
if ps -p $sfpid>/dev/null; then
    kill -9 $sfpid >/dev/null
    false
else
    true
fi

This would replace the sfdisk call itself and happen before the status check.

Of course this doesn’t make it truly dependent upon if sfdisk errors out or not. We kind of lose the ability to detect what the exit status was because we background it.

Tom Elliott

Just for fun and good to know information:

DD - Destroyer of disks

I frequently reference this site when dealing with the dd command but near the bottom or more directly:
DD - Destroyer of disks | Show progress status statistics of dd

This is one way of “looping” to get the pid and information if you really want to loop and test.

I say, test it after a sleep is the most elegant way. Looping would be more if you wanted to return output for status/progress information.

Wayne Workman

@Tom-Elliott said in "Saving Partitions" - GPT to MBR issue:

We kind of lose the ability to detect what the exit status was because we background it.

You can have the backgrounded process echo it’s exit code, like this:

(save_partitions;echo $? > /exitCode) &

Again I’ve not dug in yet. But I will.

Tom Elliott

@Wayne-Workman The problem with it is the program get’s stuck. So you’d never get a return code.

I still thinking finding out the “why” is the best approach. If we can get the “why” it’s stuck then we can figure out a way to get out of it.

Wayne Workman

@Tom-Elliott said in "Saving Partitions" - GPT to MBR issue:

The problem with it is the program get’s stuck. So you’d never get a return code.

That’s where the timer comes into play. No exit code after a specified amount of time means you need to kill the hanging process and do fixparts, then retry.

Tom Elliott

@Wayne-Workman I know where you’re headed, and I do understand. But I really think we’re “stuck” because it’s waiting for confirmation or input.

Programs typically don’t get stuck in that sense.

"Saving Partitions" - GPT to MBR issue

189

12.5k

17.5k

156.2k