Dell Precision Tower 5820 - FlexBay MiniSAS PCIe NVMe SSD not recognized
-
@hlalex Very nice. I see somethings in the FC syslog, that are missing in FOS, namely PCI Hotplug and FOS is not initing all of the CPUs, FOS stops at 8. The Developers may need to reconsider an upper limit with these new systems.
That “memory controller” is surely the device we are after.
b3:00.0 Non-Volatile memory controller [0108]: Device [1c5c:1527] # and [0.755229] pci 0000:b3:00.0: [1c5c:1527] type 00 class 0x010802 [0.755243] pci 0000:b3:00.0: reg 0x10: [mem 0xfb500000-0xfb503fff 64bit] # and [4.586882] nvme nvme0: pci function 0000:b3:00.0 [4.803854] nvme0n1: p1 p2 p3 p4
I’m going to look into a few of the messages I diff’d between the two syslogs and see what I can tweak in the kernel configuration.
-
@hlalex There is one more bit of data that would be useful from the FC running image. Collecting the output of
lsmod
to get the currently loaded dynamic modules (drivers). FOS doesn’t use dynamic modules, but instead use statically built in modules so lsmod will not work for FOS. But knowing what FC is loading will also feed into the tweaks needed. -
@george1421 Sorry for the delay, here is the output of
lsmod
on Fedora 27 Live (4.13.9-300). Let me know if anything else would be of use.Module Size Used by fuse 102400 3 nf_conntrack_netbios_ns 16384 1 nf_conntrack_broadcast 16384 1 nf_conntrack_netbios_ns xt_CT 16384 1 ip6t_rpfilter 16384 1 ip6t_REJECT 16384 2 nf_reject_ipv6 16384 1 ip6t_REJECT xt_conntrack 16384 21 ip_set 36864 0 nfnetlink 16384 1 ip_set ebtable_nat 16384 1 ebtable_broute 16384 1 bridge 143360 1 ebtable_broute ip6table_nat 16384 1 nf_conntrack_ipv6 20480 12 nf_defrag_ipv6 36864 1 nf_conntrack_ipv6 nf_nat_ipv6 16384 1 ip6table_nat ip6table_mangle 16384 1 ip6table_raw 16384 1 ip6table_security 16384 1 iptable_nat 16384 1 nf_conntrack_ipv4 16384 12 nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 nf_nat_ipv4 16384 1 iptable_nat nf_nat 28672 2 nf_nat_ipv6,nf_nat_ipv4 nf_conntrack 131072 9 nf_conntrack_ipv6,nf_conntrack_ipv4,nf_conntrack_broadcast,nf_conntrack_netbios_ns,xt_CT,nf_nat_ipv6,xt_conntrack,nf_nat_ipv4,nf_nat libcrc32c 16384 2 nf_conntrack,nf_nat iptable_mangle 16384 1 iptable_raw 16384 1 iptable_security 16384 1 ebtable_filter 16384 1 ebtables 32768 3 ebtable_filter,ebtable_nat,ebtable_broute ip6table_filter 16384 1 ip6_tables 28672 5 ip6table_mangle,ip6table_filter,ip6table_security,ip6table_raw,ip6table_nat snd_hda_codec_hdmi 49152 1 intel_rapl 20480 0 x86_pkg_temp_thermal 16384 0 intel_powerclamp 16384 0 nouveau 1638400 12 coretemp 16384 0 snd_hda_codec_realtek 94208 1 kvm_intel 200704 0 snd_hda_codec_generic 73728 1 snd_hda_codec_realtek kvm 585728 1 kvm_intel mxm_wmi 16384 1 nouveau snd_hda_intel 40960 6 i2c_algo_bit 16384 1 nouveau irqbypass 16384 1 kvm snd_hda_codec 126976 4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek ttm 94208 1 nouveau drm_kms_helper 159744 1 nouveau snd_hda_core 81920 5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek dell_wmi 16384 0 drm 352256 15 nouveau,ttm,drm_kms_helper intel_uncore 122880 0 dell_smbios 16384 1 dell_wmi snd_hwdep 20480 1 snd_hda_codec sparse_keymap 16384 1 dell_wmi snd_seq 65536 0 intel_rapl_perf 16384 0 video 40960 2 dell_wmi,nouveau iTCO_wdt 16384 0 dcdbas 16384 1 dell_smbios snd_seq_device 16384 1 snd_seq snd_pcm 98304 4 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi mei_wdt 16384 0 wmi_bmof 16384 0 iTCO_vendor_support 16384 1 iTCO_wdt tpm_tis 16384 0 dell_smm_hwmon 16384 0 tpm_tis_core 20480 1 tpm_tis tpm 53248 2 tpm_tis,tpm_tis_core snd_timer 32768 2 snd_seq,snd_pcm wmi 24576 4 dell_wmi,wmi_bmof,mxm_wmi,nouveau snd 81920 22 snd_hda_intel,snd_hwdep,snd_seq,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_seq_device,snd_hda_codec_realtek,snd_pcm intel_lpss_acpi 16384 0 intel_lpss 16384 1 intel_lpss_acpi ioatdma 53248 0 soundcore 16384 1 snd mei_me 40960 1 shpchp 36864 0 i2c_i801 24576 0 mei 102400 3 mei_me,mei_wdt dca 16384 1 ioatdma vfat 20480 1 fat 65536 1 vfat squashfs 53248 1 hid_apple 16384 0 8021q 32768 0 garp 16384 1 8021q mrp 20480 1 8021q stp 16384 2 garp,bridge llc 16384 3 garp,bridge,stp crct10dif_pclmul 16384 0 crc32_pclmul 16384 0 nvme 32768 0 crc32c_intel 24576 1 e1000e 245760 0 ghash_clmulni_intel 16384 0 nvme_core 45056 1 nvme serio_raw 16384 0 ptp 20480 1 e1000e pps_core 20480 1 ptp uas 24576 0 usb_storage 69632 2 uas sunrpc 331776 1 scsi_transport_iscsi 94208 0 loop 28672 6 Module Size Used by fuse 102400 3 nf_conntrack_netbios_ns 16384 1 nf_conntrack_broadcast 16384 1 nf_conntrack_netbios_ns xt_CT 16384 1 ip6t_rpfilter 16384 1 ip6t_REJECT 16384 2 nf_reject_ipv6 16384 1 ip6t_REJECT xt_conntrack 16384 21 ip_set 36864 0 nfnetlink 16384 1 ip_set ebtable_nat 16384 1 ebtable_broute 16384 1 bridge 143360 1 ebtable_broute ip6table_nat 16384 1 nf_conntrack_ipv6 20480 12 nf_defrag_ipv6 36864 1 nf_conntrack_ipv6 nf_nat_ipv6 16384 1 ip6table_nat ip6table_mangle 16384 1 ip6table_raw 16384 1 ip6table_security 16384 1 iptable_nat 16384 1 nf_conntrack_ipv4 16384 12 nf_defrag_ipv4 16384 1 nf_conntrack_ipv4 nf_nat_ipv4 16384 1 iptable_nat nf_nat 28672 2 nf_nat_ipv6,nf_nat_ipv4 nf_conntrack 131072 9 nf_conntrack_ipv6,nf_conntrack_ipv4,nf_conntrack_broadcast,nf_conntrack_netbios_ns,xt_CT,nf_nat_ipv6,xt_conntrack,nf_nat_ipv4,nf_nat libcrc32c 16384 2 nf_conntrack,nf_nat iptable_mangle 16384 1 iptable_raw 16384 1 iptable_security 16384 1 ebtable_filter 16384 1 ebtables 32768 3 ebtable_filter,ebtable_nat,ebtable_broute ip6table_filter 16384 1 ip6_tables 28672 5 ip6table_mangle,ip6table_filter,ip6table_security,ip6table_raw,ip6table_nat snd_hda_codec_hdmi 49152 1 intel_rapl 20480 0 x86_pkg_temp_thermal 16384 0 intel_powerclamp 16384 0 nouveau 1638400 12 coretemp 16384 0 snd_hda_codec_realtek 94208 1 kvm_intel 200704 0 snd_hda_codec_generic 73728 1 snd_hda_codec_realtek kvm 585728 1 kvm_intel mxm_wmi 16384 1 nouveau snd_hda_intel 40960 6 i2c_algo_bit 16384 1 nouveau irqbypass 16384 1 kvm snd_hda_codec 126976 4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek ttm 94208 1 nouveau drm_kms_helper 159744 1 nouveau snd_hda_core 81920 5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek dell_wmi 16384 0 drm 352256 15 nouveau,ttm,drm_kms_helper intel_uncore 122880 0 dell_smbios 16384 1 dell_wmi snd_hwdep 20480 1 snd_hda_codec sparse_keymap 16384 1 dell_wmi snd_seq 65536 0 intel_rapl_perf 16384 0 video 40960 2 dell_wmi,nouveau iTCO_wdt 16384 0 dcdbas 16384 1 dell_smbios snd_seq_device 16384 1 snd_seq snd_pcm 98304 4 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi mei_wdt 16384 0 wmi_bmof 16384 0 iTCO_vendor_support 16384 1 iTCO_wdt tpm_tis 16384 0 dell_smm_hwmon 16384 0 tpm_tis_core 20480 1 tpm_tis tpm 53248 2 tpm_tis,tpm_tis_core snd_timer 32768 2 snd_seq,snd_pcm wmi 24576 4 dell_wmi,wmi_bmof,mxm_wmi,nouveau snd 81920 22 snd_hda_intel,snd_hwdep,snd_seq,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_seq_device,snd_hda_codec_realtek,snd_pcm intel_lpss_acpi 16384 0 intel_lpss 16384 1 intel_lpss_acpi ioatdma 53248 0 soundcore 16384 1 snd mei_me 40960 1 shpchp 36864 0 i2c_i801 24576 0 mei 102400 3 mei_me,mei_wdt dca 16384 1 ioatdma vfat 20480 1 fat 65536 1 vfat squashfs 53248 1 hid_apple 16384 0 8021q 32768 0 garp 16384 1 8021q mrp 20480 1 8021q stp 16384 2 garp,bridge llc 16384 3 garp,bridge,stp crct10dif_pclmul 16384 0 crc32_pclmul 16384 0 nvme 32768 0 crc32c_intel 24576 1 e1000e 245760 0 ghash_clmulni_intel 16384 0 nvme_core 45056 1 nvme serio_raw 16384 0 ptp 20480 1 e1000e pps_core 20480 1 ptp uas 24576 0 usb_storage 69632 2 uas sunrpc 331776 1 scsi_transport_iscsi 94208 0 loop 28672 6
-
@hlalex Did bzImage version D work any better than the previous ones? I uploaded it last Friday.
-
@hlalex ok I found an interesting driver from your FC
lsmod
The one in question is shpchp which is “SHPC PCI Hotplug driver”, this is a specific hotplug driver (that is not currently enabled in “bzImage41713D”. I’m in the process of recompiling bzImage41713E. Lets see if that one gets us to that nvme drive. When I have version E uploaded I’ll send you an IM. -
@george1421 I must have missed version D, my bad. I will pull it and post the results ASAP.
I will keep an eye out for E as well.
-
@george1421 We are making some progress with version D. While booting I received the
print_req_error: I/O error, dev nvme0n1, sector 0 nvme nvme0: failed to set APST feature (-19) unable to open '/dev/nvme0n1' syspath not found
error I was getting with Clonezilla live. This is the first time this error has shown up in FOS, which I believe is progress! (At least it knows there is an nvme disk installed, even if FOS can’t access it).
Here are the logs from version D.
bzImage41713D_lsmod.log
bzImage41713D.logI’ll upload version E results shortly.
-
@hlalex YES!! I think we are really close. Version ‘E’ should provide support for the specific controller that was not in Version ‘D’.
I’m going to din into the syslog-D log you provided and compare that against both of your previous logs. We are getting very close. Thank you for sticking with this, debugging the kernel is a bit difficult without having the hardware in hand.
-
@george1421 No luck on version E. We got the APST feature error, however FOS failed with a kernel panic.
-
@george1421 Not a problem! FOG has saved me countless hours of manual configuration over the years, and until this recent batch of hardware I have not come across an issue that wasn’t already answered/fixed in another post. I am happy to help in any way I can, and hopefully save someone else (and possibly my future self) a lot of headache. We got extremely lucky with the unequal exchange approval from Dell, and other techs may not get the same break (It took 2 days and many hours on the phone/sending emails to get the exchange).
-
@hlalex You didn’t do anything specific with the init.xz did you?
-
@george1421 Didn’t touch it at all. I thought about changing it to init_32.xz, but decided to wait for you.
-
@hlalex No don’t use the 32 bit inits, that will break for sure. I’m building version ‘F’ this time. The APST error is related to APSM kernel feature turned on. I turned it on because it looked like it was part of pcie hot plug. I’ll IM you when the build is done.
If this one doesn’t work (I understand some of the error messages posted in the picture), I’m going to revert back to a stock kernel build and then go in and enable just the PCI hot plug stuff since I may have flipped too many switches trying to make it work. I’m pretty sure I’m in the right neighborhood now.
-
@george1421 F produced a similar kernel panic:
-
@hlalex Ok then rolling back to the stock FOS kernel settings and only turning on hotplug was the right place to start with. The G should be done shortly. I’m going to try to boot the kernel locally first to make sure it gets past the init on my hardware before testing on your kit.
-
Version G (reset) was sent via IM and on the plus side it did not blow up during my testing.
-
@george1421 Sorry, got caught up with some other issues. I tried G, but it hangs at
Configuring (net0 xx:xx:xx:xx:xx:xx)... ok http://ip.addr/fog/service/ipxe/boot.php... ok bzImage41713g... ok init.xz... ok _
I have double checked and init.xz is still set as default.
Hang on, just found an issue with the BIOS. Testing D - G again to see if it had any effect.
-
@george1421 Ok, double checked everything and it appears the bios change didn’t have any effect on the output of any kernels. Same errors and freezes.
-
@hlalex Ok I’m going to have to take it back into the LAB, because it didn’t do this when I tested. I’ll grab a 7060 (newest bit of hardware I have at the moment) and see why its going sideways.
If this wouldn’t have worked before I would say its the hand-off between iPXE and the uefi firmware. (because with faulty firmware we have seen it hang at this exact spot in the past) But I think you were running in bios mode the last time I look at the logs. I won’t keep pushing tests to you until I’m sure I have it booting correctly on my newest hardware.
-
Well I’m a bit confused now. The ‘G’ version works correctly on a 3050 as well as a 7060. There is a short delay after bzImage is copied before the kernel starts, like 3-4 seconds where initially I thought that it crashed like on your system.
In your environment, can you roll back to the C-G bzImages and ensure the older ones still boot? For the G build, I went back to FOG defaults and then only enabled hotplug. There might be other things that I turned on that is needed for hotplug. Rolling back to an earlier boot kernel on your end will tell me where to look in the config files.