Dell 7730 precision laptop deploy GPT error message

jmason

First time FOG user coming from a dead clonezilla Ubuntu server. I am NOT a Linux expert, I inherited that server. FOG 1.5.5 running on Centos 7.6.1810 – new installations on an older dell server 3620 desktop. I’ve setup an isolated FOG configuration as I will only ever image our training systems.

The PCs are identical Dell 7730 precision laptops that UEFI boot (believe its true uefi). PXE boot appears to be working great. Secure Boot - Disabled. Sata operation within the bios is set to AHCI.

There are 2 GPT drives which appear to fog using lsblk in debug mode as /dev/nvme0n1 and /dev/nvme1n1 both apparently M.2 PCIe drives, and both with 4 partitions each.

Drives under windows appear as (disk0 centos7)(disk1 windows10)
Under Linux expecting (nvme0n1 centos7) (nvme1n1 windows10) but it’s random base on init of the drives.

It does have access to the internet via a 2nd wireless nic and static subnet. Read lots of posts on the forum here to get to this point where I have spent a few days attempting to resolve my issue by even more reading. I’m hoping I’ve just not understood something or haven’t been able to track down the correct settings.

I’m wanting to capture both drives at once and then deploy to all of the other laptops, but there is no OS Dropdown selection item for a multisetup and I have been unable to find directions on how to determine which selection to make. I’ve tried selecting the options as below but get the error restoring GPT partition tables:

Linux - (50) and with Windows 10 - (9)
Multiple Partition Image - All Disks (Not Resizable) - (3)
Everything - (1)

PartImage - switches on its own to Partclone Gzip after I attempt to deploy a captured image.

All files in my /images directory appear to be the proper size and the uuids appear to be correct.

So during the deploy PXE boot with Linux(50) selected as the OS, I receive:

…
Erasing current MBR/GPT Tables …Done
Restoring Partition Tables (GPT)…Done
Erasing current MBR/GPT Tables …Done
Restoring Partition Tables (GPT)…Failed

Error trying to restore GPT partition tables (restorePartitionTablesAndBootLoaders)
Args Passed: /dev/nvme1n1 2 /images/DELL7730_Win10_Centos7 50 all
CMD Tried: sgdisk -gl /images/DELL7730_Win10_Centos7/d2.mbr /dev/nvme1n1
Exit returned code 4

Kernel Variables and settings:
bzImage loglevel=4 initrd=init.xz root=/dev/ram0 rw ramdisk_size-127000 web=http://192.168.0.1/fog/ consoleblank=0 rootfstype=ext4 mac=macaddressoflaptop ftp=192.168.0.1 storage=192.168.0.1:images/ storageip=192.168.0.1 osid=50 irqpoll hostname=mylaptop chkdesk=0 img=DELL7730_Win10_Centos7 imgType=mpa imgPartitionType=all imgid=6 imgFormat=0 PIGZ_COMP=-6 hostearly=1 type down

I’m not sure what to do now to resolve the issue.

jmason

When selecting Windows as the OS for the image it appears to get further…it clears the screen…places a red on green Please wait… box and then displays:

cat: /tmp/partclone.log: No such file or directory
#
# A warning has been detected
#
Image failed to restore and exited with code 1 (writeImage)
   Info:
   Args Passed: /images/DELL7730_Win10_Centos7/d2p1.img /dev/nvme1n1p1
#
# Will continue in 1 minute
#

after a minute it appeared to update the Args Passed d#p#.img and nvme#n#p# parameters.
after a few times it appeared to move on to another process that scrolled by to fast to read
this apparently happened a few times and each time updating the Args Passed.
It again cleared the screen displayed please wait and updated the line:

Args Passed:  /images/DELL7730_Win10_Centos7/d2p4.img* /dev/nvme1n1p4
#
# Will continue in 1 minute
#
* Clearing ntfs flag....Done
* Resetting UUIDs for /dev/nvme1n1
* Disk UUID being set to...
* Partition type being set to....1:alphanumeric number
* Partition uuid being set to....1:alphanumeric number

…etc…
then restarted.

Afterwards windows appeared to boot, but centos would not reporting:

error: no such device
error: file '/vmlinuz-3.10...etc' not found
error: you need to load the kernal first.

Any ideas?

Sebastian Roth

@jmason You seem to get into the details of FOG fairly quickly! Well done. It’s great to see people reading the docs and forums and making their way. Let’s see, where shall I start.

PartImage - switches on its own to Partclone Gzip after I attempt to deploy a captured image.

Partclone Gzip is definitely the better choice. I know we still had an issue in FOG 1.5.5 which made it default to Partimage on a fresh install. You don’t really want that legacy stuff. Use Partclone Gzip oder Zstd.

but there is no OS Dropdown selection item for a multisetup

Absolutely right and I’d really like to add that at some point in the future. But it makes things a lot more complex and therefore we have not added it yet. In very many cases one or the other OS choice is working for multisetup but sometimes needs a bit of adjustment. That said, it’s still possible that FOG is not able to handle your setup (combination of boot loaders, partition layouts and all that) yet but if we find out what’s wrong I am happy to add a fix!

So during the deploy PXE boot with Linux(50) selected as the OS, I receive: …

Let’s try to tackle this. Please schedule another debug deploy task. Start up the client and fire up fog when you get to the shell. Step through and when you are back to the console after the error please type sgdisk -gl /images/DELL7730_Win10_Centos7/d2.mbr /dev/nvme1n1 (most probably returns an error as well. Please take a picture or copy&paste the error message if you are connected via SSH to the client)

When selecting Windows as the OS for the image it appears to get further…

That’s kind of interesting. While I have not had the time to actually look through the scripts to figure out the difference I find it funny that it proceeds but later on seems to have restored the partitions but is failing to actually push out the contents of the image files to those partitions. This part might be harder to debug so I’d stick to OS=Linux (50) for now. Let’s see how far we get.

OS=Linux (50) should handle Linux legacy BIOS (MBR) boot loaders a bit better by the way. You say you think it’s true UEFI. So maybe this stuff is not much relevant at all. Well, let’s see what you get from the debug session mentioned above.

jmason

@Sebastian-Roth Thanks for the reply, I will remake the image with Partclone Gzip and run the debug session as instructed tomorrow when I’m back in the office.

Since you mention FOG not being ready to easily do a multiOS multidrive setup, could I get the same result running separate capture and deploy for the 2 different drives and would having the boot loader on only one drive cause a problem or not?

Sebastian Roth

@jmason What I meant was that you cannot actually tell FOG you have a multi OS setup on your machine. Still doesn’t mean that FOG cannot handle it. In many cases it works pretty well. Even more nowadays when you really have a true UEFI install with GPT, EFI boot partition and no need for MBR loaders.

I have not though of it this way but possibly you could make it two different images. But I would not want to go that route cause you would have to task your machines twice then.

jmason

@Sebastian-Roth said in Dell 7730 precision laptop deploy GPT error message:

Let’s try to tackle this. Please schedule another debug deploy task. Start up the client and fire up fog when you get to the shell. Step through and when you are back to the console after the error please type sgdisk -gl /images/DELL7730_Win10_Centos7/d2.mbr /dev/nvme1n1 (most probably returns an error as well. Please take a picture or copy&paste the error message if you are connected via SSH to the client)

Now as I mentioned I am not a linux person, so I typed fog at the prompt after booting into debug.

I get:

####
#   An Error has been detected !
####

Fatal Error: Unknown request type :: Null

Kernel variables and settings:
bzImage loglevel=4 initrd=init.xz root=/dev/ram0 rw ramdisk_size-127000 web=http://192.168.0.1/fog/ consoleblank=0 rootfstype=ext4 shutdown=1 mac=macaddressoflaptop ftp=192.168.0.1 storage=192.168.0.1:images/dev storageip=192.168.0.1 osid=50 irqpoll hostname=mylaptop isdebug=yes shutdown=1
 * Press [Enter] key to continue

Then back to command prompt #

Is there something I’m missing to run deploy in debug mode?

Looks like I have to type in more than just fog to get the deploy to run in debug.

I found these directions but they are about 3 years old:

https://wiki.fogproject.org/wiki/index.php/Debug_Mode#Deploy_Debug

Are these still correct for the current version?

Sebastian Roth

@jmason How did you schedule the task? Just go to the host’s settings in the web UI, click Basic Tasks, deploy and just before you create the task, tick the checkbox for debug. This way it very similar to a non-debug deploy but the difference is that you have to start manually (by running the very simple command fog - should work if you schedule the task as described!) and you are asked to step through the whole process instead of it going without interaction. Give it a try.

If you have scheduled the task as described already, then I am wondering if you do PXE boot the laptop or USB boot?!

jmason

I just overlooked the debug checkbox…running now

jmason

After stepping through the debug deploy as requested, I entered sgdisk -gl /images/DELL7730_Win10_Centos7/d2.mbr /dev/nvme1n1 and got some interesting output:

Creating new GPT entries.
Warning! Current disk size doesn't match that of the backup!
Adjusting sizes to match, but subsequent problems are possible!

Warning! Secondary partition table overlaps the last partition by 1000160625 blocks!
You will need to delete this partition or resize it in another utility

Problem: partition 3 is too big for the disk.

Problem: partition 4 is too big for the disk.
Aborting write operation!
Aborting write of new partition table.

Sebastian Roth

@jmason Well seems clear enough to me, is the source disk larger than the destination disk?

jmason

These are all identical systems, so could this mean some part of the capture process is possibly incorrect or something else?

One thing i’ve noticed is under windows the devices are disk 0 (500GB Linux) and disk 1 (1TB Windows), under pxe boot nvme0n1 (1TB windows) and nvme1n1 (500GB Windows) not sure if that would make any difference, I’m thinking not since it’s UEFI.

Would specifying one of these nvme drives as the Host Primary Disk make a difference?

george1421

@jmason Will you schedule a capture/deploy to both your master image computer and target computer, but schedule with the debug option.

On both the source and destination computers pxe boot them. You will enter debug mode and be dropped to a linux command prompt. Att he linux command prompt key in lsblk Post the output on both systems. This will print the geometry of both the source and destination disk(s).

jmason

@george1421

They are identical

NAME            MAJ:MIN RM    SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0 953.9G  0 disk           
|-nvme0n1p1 259:6    0   650M  0 part
|-nvme0n1p2 259:7    0   128M  0 part
|-nvme0n1p3 259:8    0 952.1G  0 part
`-nvme0n1p4 259:9    0   990M  0 part
nvme1n1     259:1    0   477G  0 disk           
|-nvme1n1p1 259:2    0   128M  0 part
|-nvme1n1p2 259:3    0   200M  0 part
|-nvme1n1p3 259:4    0     1G  0 part
`-nvme1n1p4 259:5    0 475.6G  0 part

Interesting thing right after my post, I attempted to add /dev/nvme1n1 as the Host Primary Disk, booted the task in debug mode, had error it couldn’t find the hard drives. Cancelled task and then set it the /dev/nvme0n1 and got the same message. Cleared the Host Primary Disk field and restarted again. This time there was no Failed message…

Erasing current MBR/GPT Tables …Done
Restoring Partition Tables (GPT)…Done
Erasing current MBR/GPT Tables …Done
Restoring Partition Tables (GPT)…Failed

but

Erasing current MBR/GPT Tables …Done
Restoring Partition Tables (GPT)…Done
Erasing current MBR/GPT Tables …Done
Restoring Partition Tables (GPT)…Done

and the deploy started. It has completed deploying the nvme1n0 (windows drive) and the partitions to nvme1n1 (linux drive).

Though it seems to be working, I am puzzled as to what changed and when all I did was chang the Hard Disk Primary parameter a few times and when setting it back to empty and restarting it worked.

I guess it is possible that the deploy to machine had something misaligned somewhere as I have been attempting to deploy to it over and over. But I have also been restoring the original image using macrium and testing it before each FOG deploy attempt.

Will also now attempt to deploy again to the same laptop.

Sebastian Roth

@jmason I can’t give you a reference on this but it’s actually a likely cause (one that I have not though of before, grrrhhh) that disk enumeration can put your two disks in reverse order. This is known in Linux and usually circumnavigated through persistent block device naming.

Try deploying a couple of times in a row always using the debug mode and run lsblk before starting the task. See if it’s exactly how we imagine it to be (changing disk order).

Sebastian Roth

@george1421 On the other hand I am wondering why we have not had other people reporting this in the past. What if you have a PC with two drives, one for OS and one for data. You only ever want to image the OS disk but could happen that you deploy to the data disk?! Just thinking out loud here.

jmason

Yes I’ll do that, as when I just attempted to redeploy the original error returned. These are also pretty new systems so that could be a reason for not seeing it much before.

Sebastian Roth

@jmason Yes, possibly (hopefully) this is something being more or less an issue of NVMe drives. Haha. Well, I’ll keep my head spinning on how we could possibly solve this as we have no influence on the order the Linux kernel enumerates your disks. We’d need to save disk identifier and store those with the image… I suppose.

jmason

@Sebastian-Roth

Looks like that is what it is doing, after the failed redeploy (didn’t run in debug that time of course ) I ran it in debug and the lsblk gives:

nvme0n1     259:0    0   477G  0 disk    
nvme1n1     259:1    0 953.9G  0 disk           
|-nvme1n1p1 259:2    0   128M  0 part
|-nvme1n1p2 259:3    0   200M  0 part
|-nvme1n1p3 259:4    0     1G  0 part
`-nvme1n1p4 259:5    0 475.6G  0 part

However it did not hit the error this time and appears to be deploying again now, but can’t see that working for both partitions with the mismatch…wierd.

jmason

@Sebastian-Roth

Well if you need any testing of anything just let me know, I’ll be more than happy to run things on these systems

Sebastian Roth

@jmason Thanks for testing. I’ll see what I can do for you. Guess I will take a bit of time to figure something out.

Dell 7730 precision laptop deploy GPT error message

186

12.1k

17.3k

155.4k