Dell Precision Tower 5820 - FlexBay MiniSAS PCIe NVMe SSD not recognized
-
@george1421 Ok, got some data from windows device manager for the FlexBay drive:
PCI Memory Controller [ this was an uninitialized device, figured it best to include as it is a PCI device but probably not relevant ]
- PCI\VEN_8086&DEV_A2A1&SUBSYS_07381028&REV_00
Everything below l pertains to the M.2 connected to the FlexBay
Disk drive
Device instance path- SCSI\DISK&VEN_NVME&PROD_PC401_NVME_SK_HY\5&1A7BC20F&0&000000
HW IDs
- SCSI\DiskNVMe____PC401_NVMe_SK_hy3E00
- SCSI\DiskNVMe____PC401_NVMe_SK_hy
- SCSI\DiskNVMe____
- SCSI\NVMe____PC401_NVMe_SK_hy3
- NVMe____PC401_NVMe_SK_hy3
- GenDisk
Status
- 0180200A
- DN_DRIVER_LOADED
- DN_STARTED
- DN_DISABLEABLE
- DN_NT_ENUMERATOR
- DN_NT_DRIVER
Class Guid
- {4d36e967-e325-11ce-bfc1-08002be10318}
Device stack
- \Driver\partmgr
- \Driver\Disk
- \Driver\EhStorClass
- \Driver\stornvme
Driver node strong name
- disk.inf:6d166ee9677c725c:disk_install.NT:10.0.16299.371:GenDisk
-
@george1421 version N:
-
-
@george1421 Success!!
Here are the logs:
One thing I noticed is I got some weird DHCP messages before it booted to the debug console. I will try to replicate and grab an image if possible.
Edit: Shot of the DHCP messages. It says it didn’t get an IP, but it actually does. After pressing [enter] it drops to the debug shell.
-
@george1421 I am really keen to hear what you did with kernel -300m as this seems to make a difference… right track I suppose. Maybe this is some kind of timeout?!
-
@sebastian-roth Sorry its been a long day of poking and pulling the linux kernel.
Today I tried the setting from the Arch document with no luck in the ‘M’ release.
Then I tried just removing all of the power saving code out of the kernel as well as removing all acpi code, with no success. The ‘N’ release.
Next I was about to give up then I got the idea since the FC27 kernel worked when FC27 live was booted I decided to take the tar file you linked below and just straight copy over the ‘M’ config file and then built the kernel. I did this to test 2 ideas. 1) Did I have the right kernel options selected from the 4.17.13 release 2) Was the FC devs able to patch the 4.13.9 kernel to make it work with the OPs hardware. Both tests were successful. That is what @hlalex posted in the bzImage4139-300m.log log file. In another thread an OP had an issue with the current FOS kernel and an Microsoft usb network adapter. I added the patch into 4.17.13 M kernel build. That network driver was then discovered by FOS, but FOS still has the delayed creation of the GPT partition. So that mystery was not solved as of now either. I was kind of hoping the updated FOS kernel would have addressed that but not as of now.Back on point so… It would be ideal to understand what the FC kernel dev guys did to linux 4.13.9 to create 4.13.9-300. I assume the -300 means there was 300 patches to the stock 4.13.9 kernel (??). The only thing I can think is to do a (mega)diff between linux 4.13.9 and the FC 4.13.9-300. To see what has changed.
-
@sebastian-roth It appears the -100 -200 -300 numbering is based on the FC release and not to indicate a patch level.
So that means I’m back to trying to figure out how to diff all of the files between FC version and linux version of 4.13.9.
-
Will you give the linux 4.13.9 kernel a shot before you send the hardware back this week? This last test will tell use if the fix is in the 4.13.9 kernel or something that FC did with 4.13.9-300 kernel.
-
@george1421 Sure thing, I’m keeping them around as long as possible so we can get as much info as possible from them.
-
@george1421 I have to get these drives shipped out by 3:30pm EST today. I anyone has any additional test let me know and I will get anything I can.
Otherwise, Thanks for all the help!
-
@hlalex Thank you for your help. Go ahead and send them back at this time. It looks like kernel changes between 4.13.x and 4.17.x has disabled these types of drives. My only option would be to see if 4.18 has fixed the issue. But at this point you have a solution and the hardware must go back. So we’ll park testing until someone else has this issue. Again thank you for all of your testing and data collection. We would not have made it this far without your help.
-
I have this same issue and have several machines in stock to test with if needed. However, I’m not nearly as Linux savvy as hlalex. I appreciate the effort you two have already put into it.
-
There are a few more ideas I’ve had.
First make sure your system is configured for uefi and have secure boot disabled, as well as have legacy roms disabled in the firmware.
Well if you are willing to try the 4.18.3 kernel I built yesterday, we can continue testing. It does seem to be that something broke (changed) after linux kernel 4.13.x
Here is the 4.18.3 kernel I built yesterday: https://drive.google.com/open?id=1VVXEVY0p2Gl2w4WvT_0jstOtV1jLBp1k
- Download that file to /var/www/html/fog/service/ipxe directory on the fog server.
- Register one of these ‘troubled’ workstations
- Go into the host management of this ‘troubled’ workstation and in the
Host Kernel
field enter bzImage4183 - Schedule an image deployment, but before you submit the deployment task, tick the debug check box.
- PXE boot the target computer, it should go right into deployment.
- After several enter key presses on the target computer you should be dropped to a linux command prompt.
- At this point I just want you to key in one command
lsblk
. I need to know if you see something that is similar to/dev/nvme0n1
listed as one of the block devices.
Try the above steps with
I’m in the process of rebuilding the inits to support Linux kernel 4.13.x. I’ll have another test in about 1 hr to try and older kernel with the current FOS inits. This is a lame attempt to get things moving until we can get a better understanding of what we are seeing here.
-
Something I want you try first before the new kernel, read through my previous post to understand where/what I’m talking about.
For this first test, I want you to go into host management for one of these troubled systems and in the
Kernel Parameters
field add nvme_load=YES Using the stock out of box FOS kernels.Setup a debug deployment as below and see if
lsblk
returns the nvme drive. -
@george1421 sorry I just had a moment to get back to you. I’m running Kernel 4.17.0 and have “Host Kernel Arguments” = nvme_load=YES, booting into debug running lsblk returns nothing.
-
I did also confirm that it’s using uefi, that secure boot is disabled and that legacy roms are also disabled.
-
@george1421 when the trouble machine boots into the test kernel (bzImage4183) it’s unable to get past the iPXE process:
bzImage4183... ok Could not select: Exec format error (http://ipxe.org/2e008081) Could not boot: Exec format error (http://ipxe.org/2e008081) Could not boot: Exec format error (http://ipxe.org/2e008081)
-
Sticky, also Dell Optiplex 5820.
2x Front PCIe FlexBay, 512GB Hynix M.2 PCIe NVMe Class 40 SSD )Any progress on the kernel?
-
I tested 4.18.3 also, no joy.
-
@bright23 I guess @george1421 is on holiday leave at the moment as I have not heard from him in the past days. He’s been doing a brilliant job debugging this issue and I don’t think it’s wise if I get into this. Let’s hope he’s coming back soon.