Identical NVMe drives
We have a lab of 20 or so identical PCs, all with 2 identical NVMe drives and 1 SATA drive. We’ve been using FOG since years without major issues, but this is our first setup with two identical NVMe drives. We are experiencing the issue with FOG mixing up the NVMe drives both with single disk and multi disk images. I’ve read the discussion in NVME madness, based on which there is currently no solution for this with identical NVMe drives. Are there any plans or hope for a solution?
In NVME madness it was mentioned that the disk identifiers could be used to identify the right disk, but they cannot be used for capturing since it would cause issues when deploying the image to another PC. Have you thought of implementing the usage of disk identifiers for the deployment? I’m thinking of the possibility to specify the order of drives in the Host Primary Disk field by listing all disk identifiers in the right order. I’m not worried about the drives being captured in the wrong order, I can sort that out after capturing.
@mrp Found some time to look into this again. The issue I have with testing this is that neither serial nor WWN can be looked up by
lsblkin my virtualbox setup. I have no idea why this is the case. Tools lilke udevadm, hdparm and so an show serial and WWN but lsblk does not. So in my tests I am using the disk size (
blockdev --getsize64 /dev/sda) as parameter and it works.
I added some debug statements to the scripts to further debug this. Please download the updated init, place it on your FOG server, schedule a debug deploy task and boot the host up. After the PXE boot you should see this message: “Trying to sort enumerated disks according to Host Primary Disk setting” - New init version is 20211025.
Please take a picture of the screen where you see this message and post that here in the forums.
With the replaced init file it showed Init Version 20211009 during captures/deployments.
Perfectly fine! It’s the latest as of now.
We have FOG 1.5.9, what is the default init version of that release?
Not shure exactly, but more like 20200906 - definitely a huge difference to the 20211009 you have now.
I will look into this over the weekend!
@sebastian-roth With the replaced init file it showed Init Version 20211009 during captures/deployments. Sadly, I did not think of checking whether the init version has changed. We have FOG 1.5.9, what is the default init version of that release?
Yesterday I downloaded the init file (init_adv_primary_disk.xz) and replaced the /var/www/html/fog/service/ipxe/init.xz file with the downloaded one.
Now that I think about it again I am wondering if you checked the
Init Versionnumber shown on boot up? Just to make sure it’s the correct file used.
@sebastian-roth I sent you the examples with WWNs and serials in chat.
@mrp Thanks heaps for testing and letting me know!
Can you post the actual information you set as Host Primary Disk in the FOG web UI for the specific hosts? Just a few examples.
@sebastian-roth Yesterday I downloaded the init file (init_adv_primary_disk.xz) and replaced the /var/www/html/fog/service/ipxe/init.xz file with the downloaded one.
I did multiple single and multi-disk deployments (multiple partition not resizable) as well as multiple single and multi-disk captures (multiple partition not resizable). I specified the order of disks using the Host Primary Disk field (wwn(nvme),wwn(nvme),device-name(sata) ; serial(nvme),serial(nvme),device-name(sata) ; also single disk capture and deploy by specifying a single disk like serial(nvme) or wwn(nvme)).
With the new init file, the NVME drives still got picked up seemingly randomly in all tested scenarios.
When working on single NVME drives (specified with serial or WWN), the partclone progress screen always showed /dev/nvme0n1 being captured/deployed, regardless of the NVME drive being specified in FOG and the drive that was picked by FOG.
I tested both with the normal setup (2xNVME, 1xSATA) and without the SATA drive. Interestingly, when testing with (2xNVME, 1xSATA) and specifying the Host Primary Disk like WWN1,WWN2,sata_device_name (or the same with serial), the capture/deployment always started with the SATA drive and then continued with the two NVME drives.
Maybe the serial and WWN entries were found to be invalid and/or simply skipped by FOG? The WWNs and serials were correct, I double checked each of them.
@mrp I just added another fix to the init. Make sure you re-download the file from the link below (updated the file) before you start testing.
@sebastian-roth No problem, I’m glad we have a fix for this now. The Host Primary Disk setting with the disks separated with commas is exactly what we wanted, so I’m enthusiastic to test the fix.
I have some time next week, so I’ll do some captures and deployments on the 13th of October and test the new init file. I’ll come back to you then with our results.
@mrp Sorry for the delay. Found some time to work on this finally.
Here you can download a first test version of the updated init file: https://github.com/FOGProject/fos/releases/download/20210807/init_adv_primary_disk.xz (will be removed as soon as we have enough evidence this is working as intended and not breaking anything else and therefore included into the official init)
You can use any combination of disk serial (
lsblk -pdno SERIAL /dev/sda), WWN (
lsblk -pdno WWN /dev/sda), disk size (
blockdev --getsize64 /dev/sda) or simple device name (
/dev/sda) as Host Primary Disk setting, disk entries separated by comma, for example
@mrp @testers May I ask you to give it a try and see if it works as expected. Please test as many different setups as possible (capture/deploy, All Disk/non-resizeable/resizeable image type, unicast/multicast) to make sure this change doesn’t break other parts.
@mrp No, unfortunately have not found enough time to work this out. But I have it on my list and will get to it in the next two weeks I reckon.
@sebastian-roth I hope you had a pleasant holiday. Any updates on this issue, or should I maybe open an issue on GitHub to track this?
@mrp When we struggled with this I didn’t think as far as only using identifiers for deployment. I will see what I can do when I get back from my holiday in a couple of days.
@sebastian-roth Yes, I’m thinking of disk UUIDs or the serial number.