Laptop with 2 nvme drives randomly selected so selecting one drive to capture not working
-
Related to the same systems mentioned in this post:
https://forums.fogproject.org/topic/12959/dell-7730-precision-laptop-deploy-gpt-error-message/93
in which we were able to get FOG to perform a Multiple Partition Image - All Disks (Not Resizable) capture and deploy with a 2 drive based nvme system working as expected. However when wanting to only capture one of the drives, the randomness of how the uefi assigns nvme0n1 and nvem1n1 to the physical drives doesn’t allow me to know which drive will be captured when specifying the Host Primary Disk as:
/dev/nvme0n1 (should be disk0 linux driv)
or
/dev/nvme1n1 (should be disk1 windows drive)
but since the assignment is random so is the chance of the correct drive being selected.
I am expecting that this will also affect any attempt to deploy the single drive image back to another one of the identically configured training laptops. So when only one drive is selected for imaging whatever mechanism was put in place to keep multiple drives identified is not working when only a single drive is selected.
I am rebooting running the task to deploy in debug so that I can lsblk to see the assignment of the nvme0n1 and nvme1n1 in relation to the disk size. I believe that I would have to deploy in the same way.
This should be interesting.
Fog 1.5.5 with the init.xz updated on April 15, 2019
-
@jmason Hehe, good find! Definitely something we did no consider when implementing the random nvme fix. The code was only made to try and sort nmve disks when the image type is set to “All Disks”.
As of now I can’t see how we can possibly detect this situation where you want to specify it should only capture and deploy the Windows disk (for example). The algorithm used for “All Disk” is that we save the disk size information along with the image and push out the image to the correct sized disk.
If we stick to that logic we’d need to change from specifying
/dev/nvme0n1
and instead add a field to the host configuration where you can put in a certain disk size to detect and use. But this would fail again if you’d have two identical sized disks as it would deploy to a random one again (either overwriting Linux or Windows). -
@Sebastian-Roth For now, since I’m doing training laptops and they will always be in my work area when performing the imaging, I can use debug mode with lsblk/reboot for both capture and deploy to ensure the desired image is processed. Or I can just let it process both drives each time.
-
@jmason Seems like we somehow lost track of this. Just stumbled upon this while trying to figure out another issue where someone has systems with SSD and NMVe drive. Fixing that issue will be fairly easy because I just need to check if one of the drives is not a NVMe one and skip the size logic.
But I am still not sure how we should fix the your situation with two NVMe drives and wanting to deploy only one.
One way would be to use the size information to “Single Disk” image type as well. But that way people cannot deploy “Single Disk” to different size disks anymore.
We could try to compare disk size no as a strict match in numbers but put in more logic to try to determine the correct disk that is “big enough” to receive the image. Can you think of a situation where this logic will fail (and possible overwrite a disk that you didn’t want to be blasted)???
@george1421 @Quazz Any input from your side is welcome as well!
-
@jmason If you can still duplicate this random switching of device and linux dev node names we could use your help trying to debug this. We have the idea to query the nvme controller directly to see what its seeing as to order vs what linux is detecting as the order. We need a system that has 2 nvme drives in it that are exhibiting the switching of nvme drives issue. Do you have time to test this?
-
@Sebastian-Roth I think relying on size isn’t the right approach.
Problematic scenario: Two nvme disks of the exact same size and model number. You only wish to deploy to a specific drive, the other shouldn’t be touched. How do we guarantee we get the right one?
Problematic scenario 2: Two nvme disks of different sizes. You wish to deploy to the smaller drive, so both drives can fit the image, how do you guarantee the correct drive?
The only thing unique to a drive is the serial number as far as I know. On the other hand, that would only work on system to system basis, so that’s not really that appealing.
What George brings up is probably a better way if it works, disks should be recognized in a specific order by their controller and could be identifiable in that way.
-
So I don’t loose where I was at with testing.
https://fogproject.org/inits/init_nvme-cli.xz
# nvme id-ctrl /dev/nvme0n1 -H NVME Identify Controller: vid : 0x1c5c ssvid : 0x1c5c sn : EI79Q010510209F4T mn : PC401 NVMe SK hynix 256GB fr : 80000E00 rab : 2 ieee : ace42e cmic : 0 [2:2] : 0 PCI [1:1] : 0 Single Controller [0:0] : 0 Single Port mdts : 5 cntlid : 1 ver : 10200 rtd3r : 7a120 rtd3e : 1e8480 oaes : 0x200 [31:9] : 0x1 Reserved [8:8] : 0 Namespace Attribute Changed Event Not Supported oacs : 0x17 [15:4] : 0x1 Reserved [3:3] : 0 NS Management and Attachment Not Supported [2:2] : 0x1 FW Commit and Download Supported [1:1] : 0x1 Format NVM Supported [0:0] : 0x1 Sec. Send and Receive Supported acl : 7 aerl : 3 frmw : 0x14 [4:4] : 0x1 Firmware Activate Without Reset Supported [3:1] : 0x2 Number of Firmware Slots [0:0] : 0 Firmware Slot 1 Read/Write lpa : 0x2 [1:1] : 0x1 Command Effects Log Page Supported [0:0] : 0 SMART/Health Log Page per NS Not Supported elpe : 255 npss : 4 avscc : 0x1 [0:0] : 0x1 Admin Vendor Specific Commands uses NVMe Format apsta : 0x1 [0:0] : 0x1 Autonomous Power State Transitions Supported wctemp : 352 cctemp : 354 mtfa : 0 hmpre : 0 hmmin : 0 tnvmcap : 0 unvmcap : 0 rpmbs : 0 [31:24]: 0 Access Size [23:16]: 0 Total Size [5:3] : 0 Authentication Method [2:0] : 0 Number of RPMB Units sqes : 0x66 [7:4] : 0x6 Max SQ Entry Size (64) [3:0] : 0x6 Min SQ Entry Size (64) cqes : 0x44 [7:4] : 0x4 Max CQ Entry Size (16) [3:0] : 0x4 Min CQ Entry Size (16) nn : 1 oncs : 0x1f [5:5] : 0 Reservations Not Supported [4:4] : 0x1 Save and Select Supported [3:3] : 0x1 Write Zeroes Supported [2:2] : 0x1 Data Set Management Supported [1:1] : 0x1 Write Uncorrectable Supported [0:0] : 0x1 Compare Supported fuses : 0x1 [0:0] : 0x1 Fused Compare and Write Supported fna : 0 [2:2] : 0 Crypto Erase Not Supported as part of Secure Erase [1:1] : 0 Crypto Erase Applies to Single Namespace(s) [0:0] : 0 Format Applies to Single Namespace(s) vwc : 0x1 [0:0] : 0x1 Volatile Write Cache Present awun : 7 awupf : 7 nvscc : 1 [0:0] : 0x1 NVM Vendor Specific Commands uses NVMe Format acwu : 7 sgls : 0 [0:0] : 0 Scatter-Gather Lists Not Supported subnqn : ps 0 : mp:6.00W operational enlat:5 exlat:5 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:3.80W operational enlat:30 exlat:30 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:2.40W operational enlat:100 exlat:100 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.0700W non-operational enlat:1000 exlat:1000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0070W non-operational enlat:1000 exlat:5000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:-
# nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 EI79Q010510209F4T PC401 NVMe SK hynix 256GB 1 45.28 GB / 256.06 GB 512 B + 0 B 80000E00
-
@george1421 said in Laptop with 2 nvme drives randomly selected so selecting one drive to capture not working:
cmic : 0 [2:2] : 0 PCI [1:1] : 0 Single Controller [0:0] : 0 Single Port
Nice! Is this the part we are after? Would be interesting to see this on a system with two NVMe drives! @jmason
-
@Sebastian-Roth I agree we need a system with a dual nvme drive so we can compare. I have a pdf of what all the short names really mean that I’ll link in too. I’m suspecting the field you pointed to will tell us how many and then there are other fields that will tell use more details about the drive. Also in the
nvme list
command I’m interested in seeing if the name space field changes if there are more drives as well as the interaction when these disks change ordinal position in respect to what udev assigns them.