Dell Precision Tower 5820 - FlexBay MiniSAS PCIe NVMe SSD not recognized
-
@hlalex I went ahead and compiled the latest FOS kernel with the suspected missing module enabled. This is kernel version 4.17.13. Understand its not the official FOS kernel because it doesn’t have the customized patches that FOG kernel developers add, but should work for our test to see if we can init that nvme memory device.
I’ll IM you a link to the test kernel.
For this test download the linked kernel to /var/www/html/fog/service/ipxe on the fog server, leave the name as bzImage41713 Then go into the host record for one of these test systems and set the “Host Kernel” parameter to bzImage41713. Then finally pxe boot the target system into debug capture task. Run the lspci -nn command and see if we can detect
b3:00.0 Non-Volatile memory controller [0108]: Device [1c5c:1527]
If its in the list then run the lsblk command to see if has something we can mount.
-
Good Morning @george1421. I stopped by the office to test the custom kernel (I’m going to pick up breakfast for my wife on my way home so she should be happy ;). No luck. The first attempt to boot gave init error, and I realized it was trying init_32.xz instead of init.xz. After adding init.xz to the "Host Init " parameter in the host record it booted into debug without issue. Bad news is that
b3:00.0 Non-Volatile memory controller [0108]: Device [1c5c:1527]
does not show up with
lspci -nn
and only /dev/sdb (pxe boot usb) shows up withlsblk
. -
@hlalex Well, digging a bit deeper into this…
Going line by line…
FC27 I see that the disk controller is in SATA mode
00:17.0 SATA controller [0106]: Intel Corporation 200 Series PCH SATA controller [AHCI mode] [8086:a282] Subsystem: Dell Device [1028:0738] Kernel driver in use: ahci
FOS the disk controllers was in Raid-on mode
00:17.0 RAID bus controller [0104]: Intel Corporation C600/X79 series chipset SATA RAID Controller [8086:2826]
Note that the hardware ID changes between the two modes. (not pointing fingers here) But as I said before linux does not see the disks behind raid-on devices in uefi mode. Even if you are in bios mode, please change the value to ahci mode and pull the lspci from FOS. I don’t believe that lspci under FOS supports the -k prompt, but that info would be handy.
I also found another kernel module that was disabled called “Devices/Memory Controller drivers”. After a bit more research I’m going to compile and updated kernel with Memory Controller support enabled.
-
@george1421 I had noticed that as well, and double checked the Raid settings. It had been turned back on at some point (probably while I was going through everything else). I turned it back off and re-did the debug task with the same results.
It appears that the NVMe PCIe drives use an entirely separate controller from SATA drives. Apparently the specific controller/driver varies based on the interface between the MoBo and SSD. FOS can see the M.2 drives when the are connected via a standard PCIe adapter card, however when the same M.2 drive is connected via the FlexBay MiniSAS backplane, FOS no longer recognizes the drive. This holds true regardless of the SATA controller configuration–the drives are visible with both Raid & AHCI modes.
I found a dell support article that gives some good information about the differences between NVMe and AHCI here.
I checked through the FOS kernel, and found NVME Support is enabled in lines 880-890: https://github.com/FOGProject/fos/blob/master/configs/kernelx64.config#L880
# # NVME Support # CONFIG_NVME_CORE=y CONFIG_BLK_DEV_NVME=y CONFIG_NVME_MULTIPATH=y CONFIG_NVME_FABRICS=y CONFIG_NVME_FC=y CONFIG_NVME_TARGET=y CONFIG_NVME_TARGET_LOOP=y CONFIG_NVME_TARGET_FC=y CONFIG_NVME_TARGET_FCLOOP=y
Which explains why the drives are detected when plugged into a PCIe adapter. The MiniSAS connection is a completely different beast–or at least uses a different controller from the standard NVMe drive.
I will run through the debug again (with Raid off in bios) and post the results of
lspci -nn
,lsblk
,uname -a
, and anything else you suggest. I will also do the same thing with the M.2 plugged in through a PCIe adapter so we have comparison data. -
@hlalex I have 2 additional kernels for you to test, but I will only share with you tomorrow since you should have other activities planned for today.
-
@george1421 Ok, I scripted data collection and pulled fresh info for kernels a, b, c, and F27 Live. I also pulled dmidecode, full lshw ouput, and /proc/diskstats to additional log files. Let me know if they may be of any use I will upload them as well.
bzImage41713a.log
bzImage41713b.log
bzImage41713c.log
F27_4139300.log -
@hlalex Very nice. I see somethings in the FC syslog, that are missing in FOS, namely PCI Hotplug and FOS is not initing all of the CPUs, FOS stops at 8. The Developers may need to reconsider an upper limit with these new systems.
That “memory controller” is surely the device we are after.
b3:00.0 Non-Volatile memory controller [0108]: Device [1c5c:1527] # and [0.755229] pci 0000:b3:00.0: [1c5c:1527] type 00 class 0x010802 [0.755243] pci 0000:b3:00.0: reg 0x10: [mem 0xfb500000-0xfb503fff 64bit] # and [4.586882] nvme nvme0: pci function 0000:b3:00.0 [4.803854] nvme0n1: p1 p2 p3 p4
I’m going to look into a few of the messages I diff’d between the two syslogs and see what I can tweak in the kernel configuration.
-
@hlalex There is one more bit of data that would be useful from the FC running image. Collecting the output of
lsmod
to get the currently loaded dynamic modules (drivers). FOS doesn’t use dynamic modules, but instead use statically built in modules so lsmod will not work for FOS. But knowing what FC is loading will also feed into the tweaks needed. -
@george1421 Sorry for the delay, here is the output of
lsmod
on Fedora 27 Live (4.13.9-300). Let me know if anything else would be of use.Module Size Used by fuse 102400 3 nf_conntrack_netbios_ns 16384 1 nf_conntrack_broadcast 16384 1 nf_conntrack_netbios_ns xt_CT 16384 1 ip6t_rpfilter 16384 1 ip6t_REJECT 16384 2 nf_reject_ipv6 16384 1 ip6t_REJECT xt_conntrack 16384 21 ip_set 36864 0 nfnetlink 16384 1 ip_set ebtable_nat 16384 1 ebtable_broute 16384 1 bridge 143360 1 ebtable_broute ip6table_nat 16384 1 nf_conntrack_ipv6 20480 12 nf_defrag_ipv6 36864 1 nf_conntrack_ipv6 nf_nat_ipv6 16384 1 ip6table_nat ip6table_mangle 16384 1 ip6table_raw 16384 1 ip6table_security 16384 1 iptable_nat 16384 1 nf_conntrack_ipv4 16384 12 nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 nf_nat_ipv4 16384 1 iptable_nat nf_nat 28672 2 nf_nat_ipv6,nf_nat_ipv4 nf_conntrack 131072 9 nf_conntrack_ipv6,nf_conntrack_ipv4,nf_conntrack_broadcast,nf_conntrack_netbios_ns,xt_CT,nf_nat_ipv6,xt_conntrack,nf_nat_ipv4,nf_nat libcrc32c 16384 2 nf_conntrack,nf_nat iptable_mangle 16384 1 iptable_raw 16384 1 iptable_security 16384 1 ebtable_filter 16384 1 ebtables 32768 3 ebtable_filter,ebtable_nat,ebtable_broute ip6table_filter 16384 1 ip6_tables 28672 5 ip6table_mangle,ip6table_filter,ip6table_security,ip6table_raw,ip6table_nat snd_hda_codec_hdmi 49152 1 intel_rapl 20480 0 x86_pkg_temp_thermal 16384 0 intel_powerclamp 16384 0 nouveau 1638400 12 coretemp 16384 0 snd_hda_codec_realtek 94208 1 kvm_intel 200704 0 snd_hda_codec_generic 73728 1 snd_hda_codec_realtek kvm 585728 1 kvm_intel mxm_wmi 16384 1 nouveau snd_hda_intel 40960 6 i2c_algo_bit 16384 1 nouveau irqbypass 16384 1 kvm snd_hda_codec 126976 4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek ttm 94208 1 nouveau drm_kms_helper 159744 1 nouveau snd_hda_core 81920 5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek dell_wmi 16384 0 drm 352256 15 nouveau,ttm,drm_kms_helper intel_uncore 122880 0 dell_smbios 16384 1 dell_wmi snd_hwdep 20480 1 snd_hda_codec sparse_keymap 16384 1 dell_wmi snd_seq 65536 0 intel_rapl_perf 16384 0 video 40960 2 dell_wmi,nouveau iTCO_wdt 16384 0 dcdbas 16384 1 dell_smbios snd_seq_device 16384 1 snd_seq snd_pcm 98304 4 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi mei_wdt 16384 0 wmi_bmof 16384 0 iTCO_vendor_support 16384 1 iTCO_wdt tpm_tis 16384 0 dell_smm_hwmon 16384 0 tpm_tis_core 20480 1 tpm_tis tpm 53248 2 tpm_tis,tpm_tis_core snd_timer 32768 2 snd_seq,snd_pcm wmi 24576 4 dell_wmi,wmi_bmof,mxm_wmi,nouveau snd 81920 22 snd_hda_intel,snd_hwdep,snd_seq,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_seq_device,snd_hda_codec_realtek,snd_pcm intel_lpss_acpi 16384 0 intel_lpss 16384 1 intel_lpss_acpi ioatdma 53248 0 soundcore 16384 1 snd mei_me 40960 1 shpchp 36864 0 i2c_i801 24576 0 mei 102400 3 mei_me,mei_wdt dca 16384 1 ioatdma vfat 20480 1 fat 65536 1 vfat squashfs 53248 1 hid_apple 16384 0 8021q 32768 0 garp 16384 1 8021q mrp 20480 1 8021q stp 16384 2 garp,bridge llc 16384 3 garp,bridge,stp crct10dif_pclmul 16384 0 crc32_pclmul 16384 0 nvme 32768 0 crc32c_intel 24576 1 e1000e 245760 0 ghash_clmulni_intel 16384 0 nvme_core 45056 1 nvme serio_raw 16384 0 ptp 20480 1 e1000e pps_core 20480 1 ptp uas 24576 0 usb_storage 69632 2 uas sunrpc 331776 1 scsi_transport_iscsi 94208 0 loop 28672 6 Module Size Used by fuse 102400 3 nf_conntrack_netbios_ns 16384 1 nf_conntrack_broadcast 16384 1 nf_conntrack_netbios_ns xt_CT 16384 1 ip6t_rpfilter 16384 1 ip6t_REJECT 16384 2 nf_reject_ipv6 16384 1 ip6t_REJECT xt_conntrack 16384 21 ip_set 36864 0 nfnetlink 16384 1 ip_set ebtable_nat 16384 1 ebtable_broute 16384 1 bridge 143360 1 ebtable_broute ip6table_nat 16384 1 nf_conntrack_ipv6 20480 12 nf_defrag_ipv6 36864 1 nf_conntrack_ipv6 nf_nat_ipv6 16384 1 ip6table_nat ip6table_mangle 16384 1 ip6table_raw 16384 1 ip6table_security 16384 1 iptable_nat 16384 1 nf_conntrack_ipv4 16384 12 nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 nf_nat_ipv4 16384 1 iptable_nat nf_nat 28672 2 nf_nat_ipv6,nf_nat_ipv4 nf_conntrack 131072 9 nf_conntrack_ipv6,nf_conntrack_ipv4,nf_conntrack_broadcast,nf_conntrack_netbios_ns,xt_CT,nf_nat_ipv6,xt_conntrack,nf_nat_ipv4,nf_nat libcrc32c 16384 2 nf_conntrack,nf_nat iptable_mangle 16384 1 iptable_raw 16384 1 iptable_security 16384 1 ebtable_filter 16384 1 ebtables 32768 3 ebtable_filter,ebtable_nat,ebtable_broute ip6table_filter 16384 1 ip6_tables 28672 5 ip6table_mangle,ip6table_filter,ip6table_security,ip6table_raw,ip6table_nat snd_hda_codec_hdmi 49152 1 intel_rapl 20480 0 x86_pkg_temp_thermal 16384 0 intel_powerclamp 16384 0 nouveau 1638400 12 coretemp 16384 0 snd_hda_codec_realtek 94208 1 kvm_intel 200704 0 snd_hda_codec_generic 73728 1 snd_hda_codec_realtek kvm 585728 1 kvm_intel mxm_wmi 16384 1 nouveau snd_hda_intel 40960 6 i2c_algo_bit 16384 1 nouveau irqbypass 16384 1 kvm snd_hda_codec 126976 4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek ttm 94208 1 nouveau drm_kms_helper 159744 1 nouveau snd_hda_core 81920 5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek dell_wmi 16384 0 drm 352256 15 nouveau,ttm,drm_kms_helper intel_uncore 122880 0 dell_smbios 16384 1 dell_wmi snd_hwdep 20480 1 snd_hda_codec sparse_keymap 16384 1 dell_wmi snd_seq 65536 0 intel_rapl_perf 16384 0 video 40960 2 dell_wmi,nouveau iTCO_wdt 16384 0 dcdbas 16384 1 dell_smbios snd_seq_device 16384 1 snd_seq snd_pcm 98304 4 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi mei_wdt 16384 0 wmi_bmof 16384 0 iTCO_vendor_support 16384 1 iTCO_wdt tpm_tis 16384 0 dell_smm_hwmon 16384 0 tpm_tis_core 20480 1 tpm_tis tpm 53248 2 tpm_tis,tpm_tis_core snd_timer 32768 2 snd_seq,snd_pcm wmi 24576 4 dell_wmi,wmi_bmof,mxm_wmi,nouveau snd 81920 22 snd_hda_intel,snd_hwdep,snd_seq,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_seq_device,snd_hda_codec_realtek,snd_pcm intel_lpss_acpi 16384 0 intel_lpss 16384 1 intel_lpss_acpi ioatdma 53248 0 soundcore 16384 1 snd mei_me 40960 1 shpchp 36864 0 i2c_i801 24576 0 mei 102400 3 mei_me,mei_wdt dca 16384 1 ioatdma vfat 20480 1 fat 65536 1 vfat squashfs 53248 1 hid_apple 16384 0 8021q 32768 0 garp 16384 1 8021q mrp 20480 1 8021q stp 16384 2 garp,bridge llc 16384 3 garp,bridge,stp crct10dif_pclmul 16384 0 crc32_pclmul 16384 0 nvme 32768 0 crc32c_intel 24576 1 e1000e 245760 0 ghash_clmulni_intel 16384 0 nvme_core 45056 1 nvme serio_raw 16384 0 ptp 20480 1 e1000e pps_core 20480 1 ptp uas 24576 0 usb_storage 69632 2 uas sunrpc 331776 1 scsi_transport_iscsi 94208 0 loop 28672 6
-
@hlalex Did bzImage version D work any better than the previous ones? I uploaded it last Friday.
-
@hlalex ok I found an interesting driver from your FC
lsmod
The one in question is shpchp which is “SHPC PCI Hotplug driver”, this is a specific hotplug driver (that is not currently enabled in “bzImage41713D”. I’m in the process of recompiling bzImage41713E. Lets see if that one gets us to that nvme drive. When I have version E uploaded I’ll send you an IM. -
@george1421 I must have missed version D, my bad. I will pull it and post the results ASAP.
I will keep an eye out for E as well.
-
@george1421 We are making some progress with version D. While booting I received the
print_req_error: I/O error, dev nvme0n1, sector 0 nvme nvme0: failed to set APST feature (-19) unable to open '/dev/nvme0n1' syspath not found
error I was getting with Clonezilla live. This is the first time this error has shown up in FOS, which I believe is progress! (At least it knows there is an nvme disk installed, even if FOS can’t access it).
Here are the logs from version D.
bzImage41713D_lsmod.log
bzImage41713D.logI’ll upload version E results shortly.
-
@hlalex YES!! I think we are really close. Version ‘E’ should provide support for the specific controller that was not in Version ‘D’.
I’m going to din into the syslog-D log you provided and compare that against both of your previous logs. We are getting very close. Thank you for sticking with this, debugging the kernel is a bit difficult without having the hardware in hand.
-
@george1421 No luck on version E. We got the APST feature error, however FOS failed with a kernel panic.
-
@george1421 Not a problem! FOG has saved me countless hours of manual configuration over the years, and until this recent batch of hardware I have not come across an issue that wasn’t already answered/fixed in another post. I am happy to help in any way I can, and hopefully save someone else (and possibly my future self) a lot of headache. We got extremely lucky with the unequal exchange approval from Dell, and other techs may not get the same break (It took 2 days and many hours on the phone/sending emails to get the exchange).
-
@hlalex You didn’t do anything specific with the init.xz did you?
-
@george1421 Didn’t touch it at all. I thought about changing it to init_32.xz, but decided to wait for you.
-
@hlalex No don’t use the 32 bit inits, that will break for sure. I’m building version ‘F’ this time. The APST error is related to APSM kernel feature turned on. I turned it on because it looked like it was part of pcie hot plug. I’ll IM you when the build is done.
If this one doesn’t work (I understand some of the error messages posted in the picture), I’m going to revert back to a stock kernel build and then go in and enable just the PCI hot plug stuff since I may have flipped too many switches trying to make it work. I’m pretty sure I’m in the right neighborhood now.
-
@george1421 F produced a similar kernel panic: