Dell Precision Tower 5820 - FlexBay MiniSAS PCIe NVMe SSD not recognized
-
@hlalex Ok I’m going to have to take it back into the LAB, because it didn’t do this when I tested. I’ll grab a 7060 (newest bit of hardware I have at the moment) and see why its going sideways.
If this wouldn’t have worked before I would say its the hand-off between iPXE and the uefi firmware. (because with faulty firmware we have seen it hang at this exact spot in the past) But I think you were running in bios mode the last time I look at the logs. I won’t keep pushing tests to you until I’m sure I have it booting correctly on my newest hardware.
-
Well I’m a bit confused now. The ‘G’ version works correctly on a 3050 as well as a 7060. There is a short delay after bzImage is copied before the kernel starts, like 3-4 seconds where initially I thought that it crashed like on your system.
In your environment, can you roll back to the C-G bzImages and ensure the older ones still boot? For the G build, I went back to FOG defaults and then only enabled hotplug. There might be other things that I turned on that is needed for hotplug. Rolling back to an earlier boot kernel on your end will tell me where to look in the config files.
-
@george1421 Same here, The 5820 bios has some different options regarding NVMe drives connected to the FrontFlex Bay. These devices actually show up in bios, while the same drive connected via PCIe adapter does not populate under any bios menu that I have located. This suggest something is very different at the bios level for the two connection methods.
version C
- Boots
- Error parsing PCC subspaces from PCCT
- NVMe not in lsblk
version D
- Boots
- Error parsing PCC subspaces from PCCT
- nvme nvme0: failed to set APST feature (-19)
- NVMe not in lsblk
version E
- Kernel Panic
- Error parsing PCC subspaces from PCCT
- pciehp 0000:b2:02.0:pcie004: Slot(12): Power Fault
- pciehp 0000:b2:03.0:pcie004: Slot(13): Power Fault
- acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed
- nvme nvme0: failed to set APST feature (-19)
- (/sbin/init & /bin/sh) exists but couldn’t execute it (error -8)
version F
- kernel panic
- Error parsing PCC subspaces from PCCT
- acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed error
- (/sbin/init & /bin/sh) exists but couldn’t execute it (error -8)
version G
- Boots (No idea what I was doing wrong before…)
- Error parsing PCC subspaces from PCCT
- bzImage41713g.log
- bzImage41713g_lsmod.log
-
@hlalex Thank you for taking the time to test all of these kernels. I’m glad ‘G’ is working correctly. Let me take a look at the logs you provided so I can keep pushing forward. I understand about ‘F’ crashing, I simply added some settings that sounded good without first researching. That is why I did the reset with ‘G’.
I’ll touch base again when I have a chance to digest your new logs.
-
@george1421 If you would like I can upload the bash script I am using to collect the data. Its very basic, but it speeds things up quite a bit (and automatically masks personal info).
-
@hlalex I’m in the process of rebuilding the kernel for the ( i ) release. One thing I discovered is that I need to have a better change management process in place. I compared your latest syslog with the output from FC27 and I see the error messages I fixed in an earlier release back in the syslog (mainly because I reset the configuration back to a known good setting and did not have good enough documentation on what I changed to get me to that point). Its a bit like starting over, but with the knowledge I fixed the issue once before.
The i build has the pcie hot plug enabled plus MTD support (I think your solution is a combination of drivers that need to be enabled). I’m interested in the results you will generate from this build as compared to the previous one. The level of detail you are providing is excellent.
As for the error “Error parsing PCC subspaces from PCCT” that error is actually present in both FOS and FC27 kernels.
-
@george1421 rev I logs:
-
Well, I found out why version E was so close but blew up with the inits. Actually I was chatting with one of the developers and he actually called the problem 2 days ago. He said the inits were mismatched with the kernel architecture (i.e. 64 bit kernel with 32 bit inits). I found it was actually the other way around, the kernel switched to 32 bits and you were trying to boot with the 64 bit inits. I have no idea why the kernel settings switched with the ‘E’ version.
I found this doing a side by side comparison with the kernel build config files. So in the end we were really close, but I shot myself in the foot.
I have a bit more time today so hopefully we can get this knocked out.
-
@george1421 No worries! I run into gremlins like that all the time. I’ll keep an eye out for updates.
-
@george1421 rev J boots just fine, but I did notice an error message to the effect of “/dev … error creating epoll fd”. It was gone almost before I saw it, but that is what I was able to remember. I vaguely remember seeing this error before, but I do not know if it was with any of the other test kernels or some other project.
Here are the logs:
-
@hlalex Very nice. I’m happy to see where the kernel is at, at the moment (its not currently working as you need it, but its close. I’ve also been able to address a few other issues not related to your issue).
acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] acpi PNP0A08:02: _OSC: platform does not support [PME AER] acpi PNP0A08:02: _OSC: OS now controls [PCIeHotplug PCIeCapability] pciehp 0000:b2:02.0:pcie004: Slot(12-1): Power fault
As soon as I get the above bits worked out to have them show in the FOS kernel, we should have access to that nvme drive. This is the step we were at just before the kernel got switched to 32 bit mode. Let me research these and I’ll come back with a ‘K’ release.
-
You guys are doing an awesome job here! Keep it up!
-
Version K has been posted. This one adds PCIe DMA support. We are getting very close to the config that was blowing up before where the kernel switched to 32 bit.
Not related, but I added USB-C support to this kernel for those devices that hide behind it like network adapters on usb-c docks.
-
@george1421 That’s a great addition, especially with all the new devices with USB-C (these 5820s have 2 front C ports).
Here are logs from Rev K: -
@george1421 version L logs:
-
@hlalex Well I’m down to researching this error:
[ 3.638397] nvme nvme0: failed to set APST feature (-19)
I roughly have equivalency between Fedora Core 27 (4.13.9) and FOS (4.17.13) The nvme device is being seen by the kernel, but it can’t mount it at the moment.
-
@george1421 Found a few references to this error:
https://bugs.archlinux.org/task/57331
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
http://lists.infradead.org/pipermail/linux-nvme/2017-February/008008.html
it looks like the variable to set is
nvme_core.default_ps_max_latency_us=<some_number_here>
I tried setting it to 0, 250, and 300 according some those posts (using the “Host Kernel Arguments” option in the host record) and nothing seems to change.
-
nvme nvme0: failed to set APST feature (-19)
That
-19
is usually means “No such device” (reference). Very strange.In that arch linux bug report there are
CONFIG_PCIEASPM_...
kernel configs mentioned. Had a look at those yet? @george1421What I was just thinking: Maybe Fedora has some special NVME patch included in their kernel that we don’t know about yet. Has anyone ever looked into the full Fedora kernel patchset?
EDIT: Not sure but that might be the one: https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/snapshot/fedora-kernel-4.13.9-300.fc27.tar.gz
EDIT2: Ok, sorry. This seems to be the full fedora kernel code. Anyone keen to create a diff to a vanilla kernel with that?
-
@sebastian-roth I just tested both options as kernel arguments and nothing seems to have changed.
I also tried the
pcie_aspm.policy=powersave
to no avail.Let me know and I can try to post some logs before I have to punch out.
-
@sebastian-roth The config parameter
CONFIG_PCIEASPM_POWER_SUPERSAVE
is currently not set.I’ll take a peek at the patch and see if there is anything helpful. Its so close to working (at least dmsg wise). I’d hate to give up now…