Dell 7730 precision laptop deploy GPT error message
-
First 4 reboots in archlinux showed all different inits it appears.
reboot -5- was like -1-
-
@Sebastian-Roth said in Dell 7730 precision laptop deploy GPT error message:
SystemRescueCD
5 reboots, 1st 4 times different init, 5th same as 1st
-
@jmason To me this seems to be enough evidence that it’s a general “issue” or known to work as intended. I suspect this to be “normal” as PCIe initialization probably returns the disks in different order. Weird thing is that I can’t find much about this being a particular issue with NVMe disks.
-
@george1421 said in Dell 7730 precision laptop deploy GPT error message:
@jmason said in Dell 7730 precision laptop deploy GPT error message:
There is a BIOS update with the following fixes, but don’t see anything related to our issue. I can go ahead and update the system with all the latest fixes available if requested.
I would do this no matter what even though the change log shows the fix primarily dealing with the usb-c dock.
I did load all available bios/firmware updates and retested the behavior and it is still the same.
-
@Sebastian-Roth Is it feasible to have an option for multiple disk non-resizeable and some kind of checkbox/option to notify fog that the machines are identical drive wise/hardware wise and would it make a difference. It’s been a long time since I did any coding, and it wasn’t related to this at all, just throwing a thought out.
-
@jmason Sorry if it sounded like I’d leave you alone now that we are fairly sure it’s just “normal” behaviour. I still think about how we can solve this for you and others. Though I still have not come up with a great solution to it I sort of postpone implementing a solution in hope of a flash of genius.
What is your deadline to get those devices imaged?
-
@Sebastian-Roth everything I’ve found on this issue refers to using the disks uuid to identify which one to apply it to. That doesn’t help us much as every drive on a system would have its own uuid. So how do we identify which is which? I know it doesn’t help anything. Everything from Serial to Pata and nvme aren’t guaranteed to be a persistent naming scheme for Linux. Luckily SATA and PATA seem to follow the channel pattern on how they’re connected and named. With NVME being on a pcie channel this makes enumeration dependent on how fast a disk feels like revealing itself to the system.
-
@Tom-Elliott You are spot on! The only thing I came up with so far is saving the disks sector sizes (in multiple disk mode only) and trying to match those on deployment again. Kind of ugly and possibly error-prone but could give it a try.
-
@Sebastian-Roth said in Dell 7730 precision laptop deploy GPT error message:
What is your deadline to get those devices imaged?
I have until mid March before my first full implementation with these new training laptops. I can always image them individually via usb until a working solution is found (aka someone learns how to control the nvme and its feelings of revealing).
-
@Tom-Elliott said in Dell 7730 precision laptop deploy GPT error message:
@Sebastian-Roth everything I’ve found on this issue refers to using the disks uuid to identify which one to apply it to. That doesn’t help us much as every drive on a system would have its own uuid.
When registering a system Host into Fog, you’d have to store the UUIDs of the drives and then specify which one would be your disk0/sda and disk/sdb, etc etc, … thinking out loud is all.
Then on deploy if the UUID fields and their mappings are set you use that, otherwise operate as usual.
-
@jmason The problem isn’t finding the UUID, it’s that the UUID for the disk will be different for each disk.
What do I mean?
One 7730 with 2 NVME drives will have different UUID’s.
Another 7730 with 2 NVME drives (identically sized of course) will also have different UUID’s.
Does this make sense?
-
@Tom-Elliott said in Dell 7730 precision laptop deploy GPT error message:
@jmason The problem isn’t finding the UUID, it’s that the UUID for the disk will be different for each disk.
What do I mean?
One 7730 with 2 NVME drives will have different UUID’s.
Another 7730 with 2 NVME drives (identically sized of course) will also have different UUID’s.
Does this make sense?
Yes it makes sense, but I failed in conveying my thought.
My thought was there might be some way when you do a full registration on each host machine to have an option (requiring user input) to designate each nvme drive and its UUID to a fog specific parameter/field ( disk0/sda disk1/sdb etc…) mapping stored in the database.
Then during deploy if the parameter(s) for the drives are present for the host machine, you would have info needed to match the images up based on the actual UUIDs and it wouldn’t matter what the init order of the nvme drives are.
It would require user input to perform the mapping and be optional, and only checked/used for multi-disk non-resizeable.
On registration, Do you wish to register you drives for use in multi-disk capture/deploy operations? Could maybe even have an option for the UUIDs to be entered manually from the web GUI, but it would be best to capture the UUIDs during the host registration.
So the needed info would not be saved with the image, but with the Host machine information in the database.
Not sure if that’s feasible, but just a thought.
-
The problem is the NVME drives are loading randomly. Essentially one time a drive is coming up as NVME0N1 and the next it’s NVME1N1.
Using the UUID would work, but only for the machine on which you capture the image. Basically, if you go down this route, you would essentially require an image for each machine.
Unless you manage to gather all machines’ UUID information, this just isn’t feasible.
Basically What I’m saying,
First: 7730 500GB SSD NVME and 1TB SSD NVME. 500GB UUID 0000-xxxx-0000-xxxx, 1TB UUID 0001-xxxx-0000-xxxx
Second: 7730 500GB SSD NVME and 1 TB SSD NVME. 500GB UUID 0001-xxxa-0001-xxxa, 1TB UUID 0002-xxxz-0000-xxxzYou see what I mean?
Each machine’s drives will have their own UUID’s. So simply put, you would need to know all machine’s UUID information, and inserted into the DB to clarify which one.
Of course, our coding doesn’t, yet, support this either. I imagine it wouldn’t be too difficult to enable, but it basically removes the autonomous element at least for these machines.
The NVME portion is changing and that’s the drive labeling that is determined. With SATA and PATA, this was also possible, but the channels (SATA0 - SATA4 – or how many you had on your machine) would enumerate to Linux in order of their channel number. This made /dev/sda always be on SATA0 and /dev/sdd on SATA4.
In the case of PATA, the naming would also be adjusted based on enumeration, but the Master slot on channel 0 would be /dev/hda, while the Slave slot on channel 1 would be /dev/hdd
Hopefully this helps clarify more what I was trying to get at.
-
@jmason I think we’re saying the same thing now, but it would entail a ton more work. It would also leave a lot to the person registering in ensuring information is accurate too.
-
@Tom-Elliott Yes I think so, but its the only thing I’ve been able to come up with. I’m sure all the other “imaging” devs are dealing with the same issue or soon will be. I’m thinking if the system had come with one nvme and one sata drive it probably wouldn’t have been an issue, but not sure. These just have 2 nvme drives and I guess that might become more prevalent as time progresses…
ah well… maybe you or @Sebastian-Roth will have a revelation of some kind. I haven’t thought of anything else here. -
I wonder if by-path would be a better option?
https://wiki.archlinux.org/index.php/persistent_block_device_naming#by-id_and_by-path
As far as I can tell in the pictures, the by-path portion relies on the PCI ID.
In the pictures I see 0000:02:00.0 and 0000:03:00.0 consistent. Or is this where the problem is residing?
I suppose by-id could also work, though we’d need to see 2 - 3 machines and see how different the ID’s are between them.
-
@Tom-Elliott said in Dell 7730 precision laptop deploy GPT error message:
I wonder if by-path would be a better option?
https://wiki.archlinux.org/index.php/persistent_block_device_naming#by-id_and_by-path
As far as I can tell in the pictures, the by-path portion relies on the PCI ID.
In the pictures I see 0000:02:00.0 and 0000:03:00.0 consistent. Or is this where the problem is residing?
I suppose by-id could also work, though we’d need to see 2 - 3 machines and see how different the ID’s are between them.
the PCI IDs were consistent
-
@jmason so the same disk 0000:02:00.0 was Always the same sized NVME drive?
-
@Tom-Elliott Just reviewed, no the related sizes were not the same, it appears only the PCI ID it assigned to nvme0 and nvme1 were consistent.
-
@Tom-Elliott @Sebastian-Roth
So this is mostly way over my head, but there is some brief mention of dealing with nvme on dell systems:https://www.dell.com/support/article/us/en/04/sln312382/nvme-on-rhel7?lang=en
About mid way down the page it talks about how to pull information on each device, which might be helpful.
After looking further this is working with the PCI ID or slot ID as they refer to it, so it seems odd if it is tied to a specific piece of hardware how could the size of it change on a given reboot, or does the system just get it completely mixed up. I guess it is only tied to the memory controller, the result of lspci -s SLOTID -v is the same except for the Memory at b5400000 for ID 02:00.0 and b5300000 for ID 03:00.0
And I’m way over my head so I think I’ll stick to only general things here on out lol.
From what I’m seeing on other forums regarding this issue most apparently are using some kind of method to deal with it as you and I arrived at earlier.
Looking at nvme commands in linux now…for fun I guess heh.