• Hi!

    We have a lab of 20 or so identical PCs, all with 2 identical NVMe drives and 1 SATA drive. We’ve been using FOG since years without major issues, but this is our first setup with two identical NVMe drives. We are experiencing the issue with FOG mixing up the NVMe drives both with single disk and multi disk images. I’ve read the discussion in NVME madness, based on which there is currently no solution for this with identical NVMe drives. Are there any plans or hope for a solution?

    In NVME madness it was mentioned that the disk identifiers could be used to identify the right disk, but they cannot be used for capturing since it would cause issues when deploying the image to another PC. Have you thought of implementing the usage of disk identifiers for the deployment? I’m thinking of the possibility to specify the order of drives in the Host Primary Disk field by listing all disk identifiers in the right order. I’m not worried about the drives being captured in the wrong order, I can sort that out after capturing.

  • Moderator

    @george1421 Nice, thanks heaps for your testing and the research! Will see if I can add this. Interesting you didn’t even get the serial in the mSATA…

  • Moderator

    @sebastian-roth said in Identical NVMe drives:

    feedback on the output of lsblk -pdo NAME,SERIAL,WWN in a debug session

    Testing shows that the wwn value is not displayed/present on
    Dell 7450 (msata)
    Dell 7400 nvme
    Dell 9420 nvme
    Dell 5420 nvme

    Except for the msata drive the serial number was present.

    Patch to add real lsblk to buildroot: https://patchwork.ozlabs.org/project/buildroot/patch/20160225171433.7e200dfc@free-electrons.com/

    (I didn’t look into it more than to see what needed to be updated in package/util-linux/util-linux.mk)

  • Moderator

    @george1421 Thanks! My guess is that the WWN will be empty on any hardware. Though it would be good to test on three or so different systems. Yes, could be a minified version of lsblk that we have in buildroot, not sure. Though SERIAL seems to work.

  • Moderator

    @sebastian-roth said in Identical NVMe drives:

    Anyone keen to test and give some feedback

    Sorry I forgot about this thread. I’ll give you the results tomorrow. You just need a yes or no on the WWN or do you want me to test with multiple hardware to see if its hardware specific?

    I think lsblk is part of busybox. I don’t know if buildroot has the full app or not. I haven’t look as of now.

  • Moderator

    @testers Anyone keen to test and give some feedback on the output of lsblk -pdo NAME,SERIAL,WWN in a debug session? No need to take a picture and post that here. Just let me know of WWN is shown in the output or empty.

  • Moderator

    @george1421 said in Identical NVMe drives:

    On any computer with an NVMe drive or only one that has more than one nvme drive??

    Really on any machine you have available, be IT single or multi disk, HDD or NVMe…

    My guess is that our FOS lsblk has this kind of bug where it cannot read the WWN.


  • @sebastian-roth Sorry for the delay, I have some busy days behind me. I ran the debug task and lsblk on an Intel NUC with a single NVME drive. On the NUC, the results were the same, the WWN field was empty, while the name and serial were ok.

  • Moderator

    @sebastian-roth On any computer with an NVMe drive or only one that has more than one nvme drive??

  • Moderator

    @testers @moderators Can some people please test running lsblk -pdo NAME,SERIAL,WWN in a debug session (doesn’t matter if it’s capture or deploy). No need to take a picture and post that here. Just let me know of WWN is shown in the output.

  • Moderator

    @mrp Ah well I see. Our FOS lsblk command is not able to retrieve the WWN information. Too bad. Not sure if there is anything we can do about it. Good you can use the serial numbers for now.

    Would be interesting to see if this is not working in general or only might be an issue on your computers.


  • @sebastian-roth No problem, here it is:

    20211102_173014~2.jpg

  • Moderator

    @mrp Thanks again for testing. Looks better now but still not perfect. I wonder why it doesn’t find the WWN at all. Can you please run the following commands in a debug command prompt and post output here:

    lsblk -pdno WWN /dev/sda
    lsblk -pdno WWN /dev/nvme0n1
    lsblk -pdno WWN /dev/nvme1n1
    

  • @sebastian-roth I tested the same debug tasks with the new init. It seems that the WWN cannot be retrieved by the init, but with the serial it seems to work now.

    Find another updated init binary on github that might possibly solve the issue

    serial(nvme1),serial(nvme2),device-name(sata) = eui.0025385601500953,eui.0025385601500954,/dev/sda
    20211102_141538.jpg

    wwn(nvme1),wwn(nvme2),device-name(sata) = S4EVNG0N600905D,S4EVNG0N600906P,/dev/sda
    20211102_141929.jpg

    wwn(nvme1),serial(nvme2),device-name(sata) = S4EVNG0N600905D,eui.0025385601500954,/dev/sda
    20211102_140603.jpg

  • Moderator

    @mrp Thanks for the pictures, great documentation of the test and your patience!!

    Find another updated init binary on github that might possibly solve the issue but if not it will definitely give us further insight on why it doesn’t work with serial and WWN yet.


  • @sebastian-roth Sorry for the delay, today I had some time to test this.

    Please download the updated init, place it on your FOG server, schedule a debug deploy task and boot the host up.

    I tested the new init with debug tasks with the following host primary field values:

    serial(nvme1),serial(nvme2),device-name(sata) = eui.0025385601500953,eui.0025385601500954,/dev/sda
    20211027_172913.jpg

    wwn(nvme1),wwn(nvme2),device-name(sata) = S4EVNG0N600905D,S4EVNG0N600906P,/dev/sda
    20211027_172057.jpg

    wwn(nvme1),serial(nvme2),device-name(sata) = S4EVNG0N600905D,eui.0025385601500954,/dev/sda
    20211027_173818.jpg

    Just for double checking, I also checked lsblk -pdno NAME,SERIAL,WWN again:
    20211027_174214.jpg

  • Moderator

    @mrp Found some time to look into this again. The issue I have with testing this is that neither serial nor WWN can be looked up by lsblk in my virtualbox setup. I have no idea why this is the case. Tools lilke udevadm, hdparm and so an show serial and WWN but lsblk does not. So in my tests I am using the disk size (blockdev --getsize64 /dev/sda) as parameter and it works.

    I added some debug statements to the scripts to further debug this. Please download the updated init, place it on your FOG server, schedule a debug deploy task and boot the host up. After the PXE boot you should see this message: “Trying to sort enumerated disks according to Host Primary Disk setting” - New init version is 20211025.

    Please take a picture of the screen where you see this message and post that here in the forums.

  • Moderator

    @mrp said in Identical NVMe drives:

    With the replaced init file it showed Init Version 20211009 during captures/deployments.

    Perfectly fine! It’s the latest as of now.

    We have FOG 1.5.9, what is the default init version of that release?

    Not shure exactly, but more like 20200906 - definitely a huge difference to the 20211009 you have now.

    I will look into this over the weekend!


  • @sebastian-roth With the replaced init file it showed Init Version 20211009 during captures/deployments. Sadly, I did not think of checking whether the init version has changed. We have FOG 1.5.9, what is the default init version of that release?

  • Moderator

    @mrp said:

    Yesterday I downloaded the init file (init_adv_primary_disk.xz) and replaced the /var/www/html/fog/service/ipxe/init.xz file with the downloaded one.

    Now that I think about it again I am wondering if you checked the Init Version number shown on boot up? Just to make sure it’s the correct file used.

350
Online

9.0k
Users

15.6k
Topics

145.2k
Posts