Error Restoring GPT Partition Tables
-
@tlehrian Ok so this IS something that the Linux kernel developers are going to have to address. Its not something that only impacts FOG, but all distros of Linux.
-
@george1421 said in Error Restoring GPT Partition Tables:
Its not something that only impacts FOG, but all distros of Linux.
As @Quazz said, distros don’t have that issue because they mostly use UUIDs to identify partitions. Once the identifier is set and configured in your grub.conf/fstab there is no issue finding the right one again. But we can’t do that as we need to identify the whole disk, one that we possibly have never seen before (fresh machine).
So I kind of understand why this topic is not being discussed in the Linux world too much. But I am still wondering why this is the case for NVMe drives and if there is a way to query the controller itself. Within FOG we can do a lot of things. We can even implement our very own low level tool in C to query that information for us.
-
@tlehrian Can you please run the following commands in a debug session on one of your machines:
basename $(readlink /sys/block/nvme0n1/device) basename $(readlink /sys/block/nvme0n1/device/device) basename $(readlink /sys/block/nvme1n1/device) basename $(readlink /sys/block/nvme1n1/device/device)
I borrowed that from this script. As well referenced in this topic.
I am wondering if the so called BDF (bus/device/function) notation is consistent across reboots.
Edit: Quite possibly the BDF will also change. Reading the section on “PCI Bus enumeration” in this wikipedia article I can imagine that it’s just the nature of those kind of devices.
Edit 2: Hmmmmmmm: https://superuser.com/questions/488833/do-pci-pcie-buses-and-devices-always-enumerate-in-the-same-order (not sure if this applies to PCIe NVMe devices at all)
-
@Sebastian-Roth I’m sorry it took me so long to get back to this, but I was able to run the testing this morning. It does appear that the device names stay consistent, even though the drives may change order in
lsblk
. Here are my results:State 1
> lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme1n1 259:0 0 477G 0 disk |-nvme1n1p1 259:2 0 499M 0 part |-nvme1n1p2 259:3 0 100M 0 part |-nvme1n1p3 259:4 0 16M 0 part |-nvme1n1p4 259:5 0 341.2G 0 part `-nvme1n1p5 259:6 0 135.1G 0 part nvme0n1 259:1 0 238.5G 0 disk > basename $(readlink /sys/block/nvme0n1/device) nvme0 > basename $(readlink /sys/block/nvme0n1/device/device) 0000:02:00.0 > basename $(readlink /sys/block/nvme1n1/device) nvme1 > basename $(readlink /sys/block/nvme1n1/device/device) 0000:03:00.0
State 2
> lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:0 0 477G 0 disk |-nvme0n1p1 259:2 0 499M 0 part |-nvme0n1p2 259:3 0 100M 0 part |-nvme0n1p3 259:4 0 16M 0 part |-nvme0n1p4 259:5 0 341.2G 0 part `-nvme0n1p5 259:6 0 135.1G 0 part nvme1n1 259:1 0 238.5G 0 disk > basename $(readlink /sys/block/nvme0n1/device) nvme1 > basename $(readlink /sys/block/nvme0n1/device/device) 0000:03:00.0 > basename $(readlink /sys/block/nvme1n1/device) nvme0 > basename $(readlink /sys/block/nvme1n1/device/device) 0000:02:00.0
Note how the
device
anddevice/device
names remain unchanged per disk. -
I am busy getting ready for our Fall semester to start next week, BUT…
I will keep a debug task set up for one of our machines in order to do more testing for you all if need be. I’m glad to be part of the solution here, and I promise not to take as long to respond back to a testing request next time…
-
@tlehrian Just wanted to let you know that I still have this on my list but just don’t get to it.
-
I’m just checking back on this (rather old) topic to see if any headway had been made. It seems the last tests I ran indicated that the
basename
commands did indicate the device names remained consistent even when the OS switches them. I won’t need to image these again for a while, but would be nice to see if a fix is available.Thanks!
Tim -
@tlehrian Great you are bringing this topic up again. Even though I try to keep track of all open topics I still miss one or the other sometimes.
Which version of the FOS inits do you currently use?
-
@Sebastian-Roth Looks like we’re still on 4.19.48. I have not needed to image labs since August, and don’t expect to need to do any large-scale imaging before Spring, but could certainly update and test with a newer kernel if need be.
Tim
-
@tlehrian It’s not the kernel version I am after but the version of the init file. Unfortunately there is not as easy a method to find which version you have. We added a versioning that is printed on the console when an error happens but this hasn’t been added long ago.
I may ask you do download the latest inits (from our jenkins repo) and see if you still run into the same issue. I have worked on that part a bit a couple of days ago.
-
Oh, OK. I guess I misread that. I’ll check out the new inits and see what happens, but probably won’t hit this again hard until January. Thanks for the response!
-
@tlehrian It would be really great if you could test this fairly soon so we have a little bit of time in case it needs further work. The earlier we know the better! We can’t guarantee a quick fix some time in January when you really need it.