Dell Precision Tower 5820 - FlexBay MiniSAS PCIe NVMe SSD not recognized
-
@george1421 Success!!
Here are the logs:
One thing I noticed is I got some weird DHCP messages before it booted to the debug console. I will try to replicate and grab an image if possible.
Edit: Shot of the DHCP messages. It says it didn’t get an IP, but it actually does. After pressing [enter] it drops to the debug shell.
-
@george1421 I am really keen to hear what you did with kernel -300m as this seems to make a difference… right track I suppose. Maybe this is some kind of timeout?!
-
@sebastian-roth Sorry its been a long day of poking and pulling the linux kernel.
Today I tried the setting from the Arch document with no luck in the ‘M’ release.
Then I tried just removing all of the power saving code out of the kernel as well as removing all acpi code, with no success. The ‘N’ release.
Next I was about to give up then I got the idea since the FC27 kernel worked when FC27 live was booted I decided to take the tar file you linked below and just straight copy over the ‘M’ config file and then built the kernel. I did this to test 2 ideas. 1) Did I have the right kernel options selected from the 4.17.13 release 2) Was the FC devs able to patch the 4.13.9 kernel to make it work with the OPs hardware. Both tests were successful. That is what @hlalex posted in the bzImage4139-300m.log log file. In another thread an OP had an issue with the current FOS kernel and an Microsoft usb network adapter. I added the patch into 4.17.13 M kernel build. That network driver was then discovered by FOS, but FOS still has the delayed creation of the GPT partition. So that mystery was not solved as of now either. I was kind of hoping the updated FOS kernel would have addressed that but not as of now.Back on point so… It would be ideal to understand what the FC kernel dev guys did to linux 4.13.9 to create 4.13.9-300. I assume the -300 means there was 300 patches to the stock 4.13.9 kernel (??). The only thing I can think is to do a (mega)diff between linux 4.13.9 and the FC 4.13.9-300. To see what has changed.
-
@sebastian-roth It appears the -100 -200 -300 numbering is based on the FC release and not to indicate a patch level.
So that means I’m back to trying to figure out how to diff all of the files between FC version and linux version of 4.13.9.
-
Will you give the linux 4.13.9 kernel a shot before you send the hardware back this week? This last test will tell use if the fix is in the 4.13.9 kernel or something that FC did with 4.13.9-300 kernel.
-
@george1421 Sure thing, I’m keeping them around as long as possible so we can get as much info as possible from them.
-
@george1421 I have to get these drives shipped out by 3:30pm EST today. I anyone has any additional test let me know and I will get anything I can.
Otherwise, Thanks for all the help!
-
@hlalex Thank you for your help. Go ahead and send them back at this time. It looks like kernel changes between 4.13.x and 4.17.x has disabled these types of drives. My only option would be to see if 4.18 has fixed the issue. But at this point you have a solution and the hardware must go back. So we’ll park testing until someone else has this issue. Again thank you for all of your testing and data collection. We would not have made it this far without your help.
-
I have this same issue and have several machines in stock to test with if needed. However, I’m not nearly as Linux savvy as hlalex. I appreciate the effort you two have already put into it.
-
There are a few more ideas I’ve had.
First make sure your system is configured for uefi and have secure boot disabled, as well as have legacy roms disabled in the firmware.
Well if you are willing to try the 4.18.3 kernel I built yesterday, we can continue testing. It does seem to be that something broke (changed) after linux kernel 4.13.x
Here is the 4.18.3 kernel I built yesterday: https://drive.google.com/open?id=1VVXEVY0p2Gl2w4WvT_0jstOtV1jLBp1k
- Download that file to /var/www/html/fog/service/ipxe directory on the fog server.
- Register one of these ‘troubled’ workstations
- Go into the host management of this ‘troubled’ workstation and in the
Host Kernel
field enter bzImage4183 - Schedule an image deployment, but before you submit the deployment task, tick the debug check box.
- PXE boot the target computer, it should go right into deployment.
- After several enter key presses on the target computer you should be dropped to a linux command prompt.
- At this point I just want you to key in one command
lsblk
. I need to know if you see something that is similar to/dev/nvme0n1
listed as one of the block devices.
Try the above steps with
I’m in the process of rebuilding the inits to support Linux kernel 4.13.x. I’ll have another test in about 1 hr to try and older kernel with the current FOS inits. This is a lame attempt to get things moving until we can get a better understanding of what we are seeing here.
-
Something I want you try first before the new kernel, read through my previous post to understand where/what I’m talking about.
For this first test, I want you to go into host management for one of these troubled systems and in the
Kernel Parameters
field add nvme_load=YES Using the stock out of box FOS kernels.Setup a debug deployment as below and see if
lsblk
returns the nvme drive. -
@george1421 sorry I just had a moment to get back to you. I’m running Kernel 4.17.0 and have “Host Kernel Arguments” = nvme_load=YES, booting into debug running lsblk returns nothing.
-
I did also confirm that it’s using uefi, that secure boot is disabled and that legacy roms are also disabled.
-
@george1421 when the trouble machine boots into the test kernel (bzImage4183) it’s unable to get past the iPXE process:
bzImage4183... ok Could not select: Exec format error (http://ipxe.org/2e008081) Could not boot: Exec format error (http://ipxe.org/2e008081) Could not boot: Exec format error (http://ipxe.org/2e008081)
-
Sticky, also Dell Optiplex 5820.
2x Front PCIe FlexBay, 512GB Hynix M.2 PCIe NVMe Class 40 SSD )Any progress on the kernel?
-
I tested 4.18.3 also, no joy.
-
@bright23 I guess @george1421 is on holiday leave at the moment as I have not heard from him in the past days. He’s been doing a brilliant job debugging this issue and I don’t think it’s wise if I get into this. Let’s hope he’s coming back soon.
-
@Sebastian-Roth I paused debugging on this issue. It appears that something changed in the linux kernel between 4.13.9 and 4.15.2 to causes these flex bays to not init correctly. Without having one of these systems in my hands its hard to debug exactly what is going on. I’m sure they moved some of the core code to an independent module.
The only thing I can think of to get by this issue immediately is to rebuild the inits and set the minimum kernel level to 4.13 to allow the 4.13.x kernels I built to use the current inits. Its not a nice answer, but until I happen to order one of these systems its hard to see what is actually going on.
-
@george1421 said in Dell Precision Tower 5820 - FlexBay MiniSAS PCIe NVMe SSD not recognized:
until I happen to order one of these systems its hard to see what is actually going on.
Either that or @bright23 you are getting into kernel compiling and debugging the issue with our help. Are you keen to?
-
@Sebastian-Roth said in Dell Precision Tower 5820 - FlexBay MiniSAS PCIe NVMe SSD not recognized:
Either that or @bright23 you are getting into kernel compiling and debugging the issue with our help. Are you keen to?
I still have my kernel build environment setup if something needs to be compiled. I can grab the latest linux kernel and rebuild it in the AM if needed.