Dell Precision Tower 5820 - FlexBay MiniSAS PCIe NVMe SSD not recognized
-
@hlalex Did bzImage version D work any better than the previous ones? I uploaded it last Friday.
-
@hlalex ok I found an interesting driver from your FC
lsmod
The one in question is shpchp which is “SHPC PCI Hotplug driver”, this is a specific hotplug driver (that is not currently enabled in “bzImage41713D”. I’m in the process of recompiling bzImage41713E. Lets see if that one gets us to that nvme drive. When I have version E uploaded I’ll send you an IM. -
@george1421 I must have missed version D, my bad. I will pull it and post the results ASAP.
I will keep an eye out for E as well.
-
@george1421 We are making some progress with version D. While booting I received the
print_req_error: I/O error, dev nvme0n1, sector 0 nvme nvme0: failed to set APST feature (-19) unable to open '/dev/nvme0n1' syspath not found
error I was getting with Clonezilla live. This is the first time this error has shown up in FOS, which I believe is progress! (At least it knows there is an nvme disk installed, even if FOS can’t access it).
Here are the logs from version D.
bzImage41713D_lsmod.log
bzImage41713D.logI’ll upload version E results shortly.
-
@hlalex YES!! I think we are really close. Version ‘E’ should provide support for the specific controller that was not in Version ‘D’.
I’m going to din into the syslog-D log you provided and compare that against both of your previous logs. We are getting very close. Thank you for sticking with this, debugging the kernel is a bit difficult without having the hardware in hand.
-
@george1421 No luck on version E. We got the APST feature error, however FOS failed with a kernel panic.
-
@george1421 Not a problem! FOG has saved me countless hours of manual configuration over the years, and until this recent batch of hardware I have not come across an issue that wasn’t already answered/fixed in another post. I am happy to help in any way I can, and hopefully save someone else (and possibly my future self) a lot of headache. We got extremely lucky with the unequal exchange approval from Dell, and other techs may not get the same break (It took 2 days and many hours on the phone/sending emails to get the exchange).
-
@hlalex You didn’t do anything specific with the init.xz did you?
-
@george1421 Didn’t touch it at all. I thought about changing it to init_32.xz, but decided to wait for you.
-
@hlalex No don’t use the 32 bit inits, that will break for sure. I’m building version ‘F’ this time. The APST error is related to APSM kernel feature turned on. I turned it on because it looked like it was part of pcie hot plug. I’ll IM you when the build is done.
If this one doesn’t work (I understand some of the error messages posted in the picture), I’m going to revert back to a stock kernel build and then go in and enable just the PCI hot plug stuff since I may have flipped too many switches trying to make it work. I’m pretty sure I’m in the right neighborhood now.
-
@george1421 F produced a similar kernel panic:
-
@hlalex Ok then rolling back to the stock FOS kernel settings and only turning on hotplug was the right place to start with. The G should be done shortly. I’m going to try to boot the kernel locally first to make sure it gets past the init on my hardware before testing on your kit.
-
Version G (reset) was sent via IM and on the plus side it did not blow up during my testing.
-
@george1421 Sorry, got caught up with some other issues. I tried G, but it hangs at
Configuring (net0 xx:xx:xx:xx:xx:xx)... ok http://ip.addr/fog/service/ipxe/boot.php... ok bzImage41713g... ok init.xz... ok _
I have double checked and init.xz is still set as default.
Hang on, just found an issue with the BIOS. Testing D - G again to see if it had any effect.
-
@george1421 Ok, double checked everything and it appears the bios change didn’t have any effect on the output of any kernels. Same errors and freezes.
-
@hlalex Ok I’m going to have to take it back into the LAB, because it didn’t do this when I tested. I’ll grab a 7060 (newest bit of hardware I have at the moment) and see why its going sideways.
If this wouldn’t have worked before I would say its the hand-off between iPXE and the uefi firmware. (because with faulty firmware we have seen it hang at this exact spot in the past) But I think you were running in bios mode the last time I look at the logs. I won’t keep pushing tests to you until I’m sure I have it booting correctly on my newest hardware.
-
Well I’m a bit confused now. The ‘G’ version works correctly on a 3050 as well as a 7060. There is a short delay after bzImage is copied before the kernel starts, like 3-4 seconds where initially I thought that it crashed like on your system.
In your environment, can you roll back to the C-G bzImages and ensure the older ones still boot? For the G build, I went back to FOG defaults and then only enabled hotplug. There might be other things that I turned on that is needed for hotplug. Rolling back to an earlier boot kernel on your end will tell me where to look in the config files.
-
@george1421 Same here, The 5820 bios has some different options regarding NVMe drives connected to the FrontFlex Bay. These devices actually show up in bios, while the same drive connected via PCIe adapter does not populate under any bios menu that I have located. This suggest something is very different at the bios level for the two connection methods.
version C
- Boots
- Error parsing PCC subspaces from PCCT
- NVMe not in lsblk
version D
- Boots
- Error parsing PCC subspaces from PCCT
- nvme nvme0: failed to set APST feature (-19)
- NVMe not in lsblk
version E
- Kernel Panic
- Error parsing PCC subspaces from PCCT
- pciehp 0000:b2:02.0:pcie004: Slot(12): Power Fault
- pciehp 0000:b2:03.0:pcie004: Slot(13): Power Fault
- acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed
- nvme nvme0: failed to set APST feature (-19)
- (/sbin/init & /bin/sh) exists but couldn’t execute it (error -8)
version F
- kernel panic
- Error parsing PCC subspaces from PCCT
- acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed error
- (/sbin/init & /bin/sh) exists but couldn’t execute it (error -8)
version G
- Boots (No idea what I was doing wrong before…)
- Error parsing PCC subspaces from PCCT
- bzImage41713g.log
- bzImage41713g_lsmod.log
-
@hlalex Thank you for taking the time to test all of these kernels. I’m glad ‘G’ is working correctly. Let me take a look at the logs you provided so I can keep pushing forward. I understand about ‘F’ crashing, I simply added some settings that sounded good without first researching. That is why I did the reset with ‘G’.
I’ll touch base again when I have a chance to digest your new logs.
-
@george1421 If you would like I can upload the bash script I am using to collect the data. Its very basic, but it speeds things up quite a bit (and automatically masks personal info).