Dell 7730 precision laptop deploy GPT error message
-
@Tom-Elliott Yes I think so, but its the only thing I’ve been able to come up with. I’m sure all the other “imaging” devs are dealing with the same issue or soon will be. I’m thinking if the system had come with one nvme and one sata drive it probably wouldn’t have been an issue, but not sure. These just have 2 nvme drives and I guess that might become more prevalent as time progresses…
ah well… maybe you or @Sebastian-Roth will have a revelation of some kind. I haven’t thought of anything else here. -
I wonder if by-path would be a better option?
https://wiki.archlinux.org/index.php/persistent_block_device_naming#by-id_and_by-path
As far as I can tell in the pictures, the by-path portion relies on the PCI ID.
In the pictures I see 0000:02:00.0 and 0000:03:00.0 consistent. Or is this where the problem is residing?
I suppose by-id could also work, though we’d need to see 2 - 3 machines and see how different the ID’s are between them.
-
@Tom-Elliott said in Dell 7730 precision laptop deploy GPT error message:
I wonder if by-path would be a better option?
https://wiki.archlinux.org/index.php/persistent_block_device_naming#by-id_and_by-path
As far as I can tell in the pictures, the by-path portion relies on the PCI ID.
In the pictures I see 0000:02:00.0 and 0000:03:00.0 consistent. Or is this where the problem is residing?
I suppose by-id could also work, though we’d need to see 2 - 3 machines and see how different the ID’s are between them.
the PCI IDs were consistent
-
@jmason so the same disk 0000:02:00.0 was Always the same sized NVME drive?
-
@Tom-Elliott Just reviewed, no the related sizes were not the same, it appears only the PCI ID it assigned to nvme0 and nvme1 were consistent.
-
@Tom-Elliott @Sebastian-Roth
So this is mostly way over my head, but there is some brief mention of dealing with nvme on dell systems:https://www.dell.com/support/article/us/en/04/sln312382/nvme-on-rhel7?lang=en
About mid way down the page it talks about how to pull information on each device, which might be helpful.
After looking further this is working with the PCI ID or slot ID as they refer to it, so it seems odd if it is tied to a specific piece of hardware how could the size of it change on a given reboot, or does the system just get it completely mixed up. I guess it is only tied to the memory controller, the result of lspci -s SLOTID -v is the same except for the Memory at b5400000 for ID 02:00.0 and b5300000 for ID 03:00.0
And I’m way over my head so I think I’ll stick to only general things here on out lol.
From what I’m seeing on other forums regarding this issue most apparently are using some kind of method to deal with it as you and I arrived at earlier.
Looking at nvme commands in linux now…for fun I guess heh.
-
@jmason @Tom-Elliott Although I kind of liked the idea you both came up with at first I can’t see this being a user-friendly and reliable solution the more I think about it. On top it would mean a huge change in FOG. Not that I wanna block those kind of changes, not at all. But I only would wanna go that way if it’s an appropriate solution.
Adding a simple sector count check is not much of a thing to implement and it would work in most situations (at least those I can think of so far). Even if the two disks are same size it wouldn’t hurt because deploying to the “wrong” one is not a problem.
-
@Sebastian-Roth said in Dell 7730 precision laptop deploy GPT error message:
@jmason @Tom-Elliott
Adding a simple sector count check is not much of a thing to implement and it would work in most situations (at least those I can think of so far). Even if the two disks are same size it wouldn’t hurt because deploying to the “wrong” one is not a problem.This would definitely be true for me if my systems had 2 identical size hard drives as we would be imaging them both. I wouldn’t really care which one it picked as long as both were available at boot.
Could you make the functionality optional via some kind of check mark if multi-disk non-resizeable is selected? Then it wouldn’t affect everyone using that selection unless they so chose to do so.
-
@jmason said in Dell 7730 precision laptop deploy GPT error message:
Could you make the functionality optional via some kind of check mark if multi-disk non-resizeable is selected?
Probably can but I don’t see why this would effect other users at all. “All Disk” option is non-resizable and therefore trying to allocate the image to the right disk by using a sector count shouldn’t hurt anyone really.
-
@Sebastian-Roth Well if you move forward with this just let me know when you want some testing.
-
@Sebastian-Roth @Tom-Elliott One thing I realized today is that when the deploy fails it reboots and that gives the system a chance to initialize the way the master image expects.
Initially I assumed the key to this working for my setup was in making sure that the smaller drive was the first drive in the master image captured, so that it didn’t attempt to deploy the smaller image onto the larger drive and then fail when attempting to image the larger image onto the smaller drive. I’m not sure that is actually necessary.
So I hooked up 10 of my laptops to the switch today and deployed the group, about half failed the first startup, but on the next reboot all of them initialized the drives as the master image expected.
This might not work well for a system with more than 2 nvme drives being imaged, so I’ll still help test anything you guys come up with and need testing. But I’m fairly satisfied with even the failure and reboot and hoping it will init correctly on the next boot.
-
@jmason Well that is definitely not too bad of an idea. Just let it try often enough till it doesn’t fail anymore. While this will help you not getting under pressure time-wise it’s not an ideal solution. I will let you know when I get something to test ready.
-
@jmason said in Dell 7730 precision laptop deploy GPT error message:
Ubuntu showed the behavior on the 3rd with lsblk and 5th reboot with dmesg, while reboot 7 was different than all previous, I’ll move on to the other 2 ISOs next.
Since this issue is happening with a commercial versions of linux… I wonder if there is any value in calling Dell tech support? This could be the linux kernel doing this, or it could be the uefi firmware. I think you have enough evidence to say its either the hardware, uefi, or the linux kernel doing it. Your ubuntu test doesn’t use the latest kernel, but FOG does so you have a range of kernels where this problem exists.
-
@jmason Ok, got a bit of time to code and test over the weekend. Here is a first try.
Download the init file from our website manually and put in
/var/www/html/fog/service/ipxe/
on your FOG server. Make sure the file is owned by the apache webserver user (see user name of the other files in that directory)! Now edit the settings of one of the hosts you are trying to deploy to in the FOG web UI and set Host Init toinit_nvme.xz
. Schedule a deploy task and now keep an eye on the blue screen output of partclone. It should tell youNTFS
for windows partitions and probablyXFS
orEXT4
for the CentOS Linux partitions. See which one it does first. Note that down and do another two or three rounds till you see it deploying the other OS/disk first. -
This post is deleted! -
@Sebastian-Roth set it up as described and did a debug deploy. There was a new message I hadn’t seen before:
*Preparing Partition layout cat: '/images/myImageName/*.size' : No such file or directory
Same message appears after the Attempting to deploy image notice box.
In the Partclone window, the File System partitions for my linux ~500GB drive showed:
/dev/nvme0n1p1 as raw (134.2 MB) /dev/nvme0n1p2 as FAT16 (209.7 MB) /dev/nvme0n1p3 as XFS (1.1GB) /dev/nvme0n1p4 as raw (510.7 GB)
Just checking to see if this is running as expected…will post nvme1n1 partition info from this first run once the current one completes.
-
@jmason Hmmm, forgot to tell you that you need to re-upload the image before deployment. Sorry! On upload the *.size files will be generated.
-
This post is deleted! -
@Sebastian-Roth So with the newly captured image also created with the init_nvme.xz as Host Init, I brought the host for deploy up in debug mode .
I ensured that the init disk order nvme0n1 and nvme1n1 matched the same order for when I made the image (Just like I did last week when deploying to my 19 laptops).…well running again now with the proper init settings and host image.
-
@Sebastian-Roth Okay after attempting it the third time I’m sure that I have everything assigned appropriately. Still running in deploy-debug the message is confirmed.
After the Preparing Partition layout message I get the
An error has been detected!
box.No drive number passed (restore PartitionTablesAndBootLoaders) Args Passed: /dev/nvme0n1 /images/mydiskimage 50 all Kernel variables and settings: bzImage loglevel=4 initrd=init_nvme.xz root=dev/ram0 rw amdisk_size=127000 web=http://192.168.0.1/fog/ consoleblank=0 rootfstype=ext4 shutdown=1 mac=macaddressoflaptop ftp=192.168.0.1 storage=192.168.0.1:/images/ storageip=192.168.0.1 osid=50 irqpoll hostname=mylaptop chkdsk=0 img=mydiskimage imgType=mpa imgPartitionType=all imgid=11 imgFormat=0 PIGZ_COMP=-6 hostearly=1 isdebug=yes type=down shutdown=1