Laptop with 2 nvme drives randomly selected so selecting one drive to capture not working



  • Related to the same systems mentioned in this post:

    https://forums.fogproject.org/topic/12959/dell-7730-precision-laptop-deploy-gpt-error-message/93

    in which we were able to get FOG to perform a Multiple Partition Image - All Disks (Not Resizable) capture and deploy with a 2 drive based nvme system working as expected. However when wanting to only capture one of the drives, the randomness of how the uefi assigns nvme0n1 and nvem1n1 to the physical drives doesn’t allow me to know which drive will be captured when specifying the Host Primary Disk as:

    /dev/nvme0n1 (should be disk0 linux driv)
    

    or

    /dev/nvme1n1 (should be disk1 windows drive)
    

    but since the assignment is random so is the chance of the correct drive being selected.

    I am expecting that this will also affect any attempt to deploy the single drive image back to another one of the identically configured training laptops. So when only one drive is selected for imaging whatever mechanism was put in place to keep multiple drives identified is not working when only a single drive is selected.

    I am rebooting running the task to deploy in debug so that I can lsblk to see the assignment of the nvme0n1 and nvme1n1 in relation to the disk size. I believe that I would have to deploy in the same way.

    This should be interesting.

    Fog 1.5.5 with the init.xz updated on April 15, 2019


  • Moderator

    @Sebastian-Roth I agree we need a system with a dual nvme drive so we can compare. I have a pdf of what all the short names really mean that I’ll link in too. I’m suspecting the field you pointed to will tell us how many and then there are other fields that will tell use more details about the drive. Also in the nvme list command I’m interested in seeing if the name space field changes if there are more drives as well as the interaction when these disks change ordinal position in respect to what udev assigns them.


  • Senior Developer

    @george1421 said in Laptop with 2 nvme drives randomly selected so selecting one drive to capture not working:

    cmic    : 0
      [2:2] : 0     PCI
      [1:1] : 0     Single Controller
      [0:0] : 0     Single Port
    

    Nice! Is this the part we are after? Would be interesting to see this on a system with two NVMe drives! @jmason


  • Moderator

    So I don’t loose where I was at with testing.

    https://fogproject.org/inits/init_nvme-cli.xz

    # nvme id-ctrl /dev/nvme0n1 -H
    
    NVME Identify Controller:
    vid     : 0x1c5c
    ssvid   : 0x1c5c
    sn      : EI79Q010510209F4T
    mn      : PC401 NVMe SK hynix 256GB
    fr      : 80000E00
    rab     : 2
    ieee    : ace42e
    cmic    : 0
      [2:2] : 0     PCI
      [1:1] : 0     Single Controller
      [0:0] : 0     Single Port
    
    mdts    : 5
    cntlid  : 1
    ver     : 10200
    rtd3r   : 7a120
    rtd3e   : 1e8480
    oaes    : 0x200
     [31:9] : 0x1   Reserved
      [8:8] : 0     Namespace Attribute Changed Event Not Supported
    
    oacs    : 0x17
     [15:4] : 0x1   Reserved
      [3:3] : 0     NS Management and Attachment Not Supported
      [2:2] : 0x1   FW Commit and Download Supported
      [1:1] : 0x1   Format NVM Supported
      [0:0] : 0x1   Sec. Send and Receive Supported
    
    acl     : 7
    aerl    : 3
    frmw    : 0x14
      [4:4] : 0x1   Firmware Activate Without Reset Supported
      [3:1] : 0x2   Number of Firmware Slots
      [0:0] : 0     Firmware Slot 1 Read/Write
    
    lpa     : 0x2
      [1:1] : 0x1   Command Effects Log Page Supported
      [0:0] : 0     SMART/Health Log Page per NS Not Supported
    
    elpe    : 255
    npss    : 4
    avscc   : 0x1
      [0:0] : 0x1   Admin Vendor Specific Commands uses NVMe Format
    
    apsta   : 0x1
      [0:0] : 0x1   Autonomous Power State Transitions Supported
    
    wctemp  : 352
    cctemp  : 354
    mtfa    : 0
    hmpre   : 0
    hmmin   : 0
    tnvmcap : 0
    unvmcap : 0
    rpmbs   : 0
     [31:24]: 0     Access Size
     [23:16]: 0     Total Size
      [5:3] : 0     Authentication Method
      [2:0] : 0     Number of RPMB Units
    
    sqes    : 0x66
      [7:4] : 0x6   Max SQ Entry Size (64)
      [3:0] : 0x6   Min SQ Entry Size (64)
    
    cqes    : 0x44
      [7:4] : 0x4   Max CQ Entry Size (16)
      [3:0] : 0x4   Min CQ Entry Size (16)
    
    nn      : 1
    oncs    : 0x1f
      [5:5] : 0     Reservations Not Supported
      [4:4] : 0x1   Save and Select Supported
      [3:3] : 0x1   Write Zeroes Supported
      [2:2] : 0x1   Data Set Management Supported
      [1:1] : 0x1   Write Uncorrectable Supported
      [0:0] : 0x1   Compare Supported
    
    fuses   : 0x1
      [0:0] : 0x1   Fused Compare and Write Supported
    
    fna     : 0
      [2:2] : 0     Crypto Erase Not Supported as part of Secure Erase
      [1:1] : 0     Crypto Erase Applies to Single Namespace(s)
      [0:0] : 0     Format Applies to Single Namespace(s)
    
    vwc     : 0x1
      [0:0] : 0x1   Volatile Write Cache Present
    
    awun    : 7
    awupf   : 7
    nvscc   : 1
      [0:0] : 0x1   NVM Vendor Specific Commands uses NVMe Format
    
    acwu    : 7
    sgls    : 0
      [0:0] : 0     Scatter-Gather Lists Not Supported
    
    subnqn  :
    ps    0 : mp:6.00W operational enlat:5 exlat:5 rrt:0 rrl:0
              rwt:0 rwl:0 idle_power:- active_power:-
    ps    1 : mp:3.80W operational enlat:30 exlat:30 rrt:1 rrl:1
              rwt:1 rwl:1 idle_power:- active_power:-
    ps    2 : mp:2.40W operational enlat:100 exlat:100 rrt:2 rrl:2
              rwt:2 rwl:2 idle_power:- active_power:-
    ps    3 : mp:0.0700W non-operational enlat:1000 exlat:1000 rrt:3 rrl:3
              rwt:3 rwl:3 idle_power:- active_power:-
    ps    4 : mp:0.0070W non-operational enlat:1000 exlat:5000 rrt:3 rrl:3
              rwt:3 rwl:3 idle_power:- active_power:-
    
    # nvme list
    Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
    ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
    /dev/nvme0n1     EI79Q010510209F4T    PC401 NVMe SK hynix 256GB                1          45.28  GB / 256.06  GB    512   B +  0 B   80000E00
    
    

  • Moderator

    @Sebastian-Roth I think relying on size isn’t the right approach.

    Problematic scenario: Two nvme disks of the exact same size and model number. You only wish to deploy to a specific drive, the other shouldn’t be touched. How do we guarantee we get the right one?

    Problematic scenario 2: Two nvme disks of different sizes. You wish to deploy to the smaller drive, so both drives can fit the image, how do you guarantee the correct drive?

    The only thing unique to a drive is the serial number as far as I know. On the other hand, that would only work on system to system basis, so that’s not really that appealing.

    What George brings up is probably a better way if it works, disks should be recognized in a specific order by their controller and could be identifiable in that way.


  • Moderator

    @jmason If you can still duplicate this random switching of device and linux dev node names we could use your help trying to debug this. We have the idea to query the nvme controller directly to see what its seeing as to order vs what linux is detecting as the order. We need a system that has 2 nvme drives in it that are exhibiting the switching of nvme drives issue. Do you have time to test this?

    @Sebastian-Roth


  • Senior Developer

    @jmason Seems like we somehow lost track of this. Just stumbled upon this while trying to figure out another issue where someone has systems with SSD and NMVe drive. Fixing that issue will be fairly easy because I just need to check if one of the drives is not a NVMe one and skip the size logic.

    But I am still not sure how we should fix the your situation with two NVMe drives and wanting to deploy only one.

    One way would be to use the size information to “Single Disk” image type as well. But that way people cannot deploy “Single Disk” to different size disks anymore.

    We could try to compare disk size no as a strict match in numbers but put in more logic to try to determine the correct disk that is “big enough” to receive the image. Can you think of a situation where this logic will fail (and possible overwrite a disk that you didn’t want to be blasted)???

    @george1421 @Quazz Any input from your side is welcome as well!



  • @Sebastian-Roth For now, since I’m doing training laptops and they will always be in my work area when performing the imaging, I can use debug mode with lsblk/reboot for both capture and deploy to ensure the desired image is processed. Or I can just let it process both drives each time.


  • Senior Developer

    @jmason Hehe, good find! Definitely something we did no consider when implementing the random nvme fix. The code was only made to try and sort nmve disks when the image type is set to “All Disks”.

    As of now I can’t see how we can possibly detect this situation where you want to specify it should only capture and deploy the Windows disk (for example). The algorithm used for “All Disk” is that we save the disk size information along with the image and push out the image to the correct sized disk.

    If we stick to that logic we’d need to change from specifying /dev/nvme0n1 and instead add a field to the host configuration where you can put in a certain disk size to detect and use. But this would fail again if you’d have two identical sized disks as it would deploy to a random one again (either overwriting Linux or Windows).


Log in to reply
 

367
Online

7.4k
Users

14.5k
Topics

136.7k
Posts