• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    rcu_sched stall OR kernel panic on PowerEdge R640

    Scheduled Pinned Locked Moved Solved
    FOG Problems
    5
    45
    5.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • george1421G
      george1421 Moderator @djgalloway
      last edited by george1421

      @djgalloway Is this system in uefi or bios (legacy) mode?

      So the only difference between the kernel starting and not is the acpi=off being used?

      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

      D 1 Reply Last reply Reply Quote 0
      • D
        djgalloway @george1421
        last edited by

        @george1421 Yes, BIOS mode.

        Screenshot at 2019-09-18 09-36-26.png

        george1421G 1 Reply Last reply Reply Quote 0
        • george1421G
          george1421 Moderator @djgalloway
          last edited by

          @djgalloway Just for clarity

          So the only difference between the kernel starting and not is the acpi=off being used?

          Is this still accurate?

          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

          D 1 Reply Last reply Reply Quote 0
          • D
            djgalloway @george1421
            last edited by

            @george1421 Right. So, it turns out I had the wrong serial TTY set. I changed it to console=ttyS1,115200 without acpi=off and got the following:

            Linux version 4.19.64 (jenkins-agent@Tollana) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP Mon Aug 5 11:08:49 CDT 2019
            Command line: loglevel=7 initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 web=http://10.8.128.2/fog/ consoleblank=0 rootfstype=ext4 console=tty0 console=ttyS1,115200 mac=e4:43:4b:7d:a9:ba ftp=10.8.128.2 storage=10.8.128.2:/opt/fog/images/dev/ storageip=10.8.128.2 osi0
            KERNEL supported cpus:
              Intel GenuineIntel
              AMD AuthenticAMD
              Centaur CentaurHauls
            x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
            x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
            x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
            x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
            x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
            x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
            x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
            x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
            x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
            x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
            x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
            x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
            x86/fpu: xstate_offset[5]:  960, xstate_sizes[5]:   64
            x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]:  512
            x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
            x86/fpu: xstate_offset[9]: 2560, xstate_sizes[9]:    8
            x86/fpu: Enabled xstate features 0x2ff, context size is 2568 bytes, using 'compacted' format.
            BIOS-provided physical RAM map:
            BIOS-e820: [mem 0x0000000000000000-0x000000000008bfff] usable
            BIOS-e820: [mem 0x000000000008c000-0x000000000009ffff] reserved
            BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
            BIOS-e820: [mem 0x0000000000100000-0x000000005ddfefff] usable
            BIOS-e820: [mem 0x000000005ddff000-0x000000006cffefff] reserved
            BIOS-e820: [mem 0x000000006cfff000-0x000000006effefff] ACPI NVS
            BIOS-e820: [mem 0x000000006efff000-0x000000006f7fefff] ACPI data
            BIOS-e820: [mem 0x000000006f7ff000-0x000000006f7fffff] usable
            BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved
            BIOS-e820: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved
            BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
            BIOS-e820: [mem 0x00000000fec80000-0x00000000fed00fff] reserved
            BIOS-e820: [mem 0x00000000fed40000-0x00000000fed44fff] reserved
            BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
            BIOS-e820: [mem 0x0000000100000000-0x000000183fffffff] usable
            NX (Execute Disable) protection: active
            SMBIOS 3.2 present.
            DMI: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.2.11 06/13/2019
            tsc: Detected 2200.000 MHz processor
            last_pfn = 0x1840000 max_arch_pfn = 0x400000000
            x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
            x2apic: enabled by BIOS, switching to x2apic ops
            last_pfn = 0x6f800 max_arch_pfn = 0x400000000
            Using GB pages for direct mapping
            RAMDISK: [mem 0x5ca97000-0x5dd50fff]
            ACPI: Early table checksum verification disabled
            ACPI: RSDP 0x00000000000FE320 000024 (v02 DELL  )
            ACPI: XSDT 0x000000006F41B188 0000F4 (v01 DELL   PE_SC3   00000000      01000013)
            ACPI: FACP 0x000000006F7F9000 000114 (v06 DELL   PE_SC3   00000000 DELL 00000001)
            ACPI: DSDT 0x000000006F507000 2E2494 (v02 DELL   PE_SC3   00000003 DELL 00000001)
            ACPI: FACS 0x000000006EA6E000 000040
            ACPI: SSDT 0x000000006F7FC000 00046C (v02 INTEL  ADDRXLAT 00000001 INTL 20180508)
            ACPI: WDAT 0x000000006F7FB000 000134 (v01 DELL   PE_SC3   00000001 DELL 00000001)
            ACPI: SLIC 0x000000006F7FA000 000024 (v01 DELL   PE_SC3   00000001 DELL 00000001)
            ACPI: HPET 0x000000006F7F8000 000038 (v01 DELL   PE_SC3   00000001 DELL 00000001)
            ACPI: APIC 0x000000006F7F6000 0016DE (v04 DELL   PE_SC3   00000000 DELL 00000001)
            ACPI: MCFG 0x000000006F7F5000 00003C (v01 DELL   PE_SC3   00000001 DELL 00000001)
            ACPI: MIGT 0x000000006F7F4000 000040 (v01 DELL   PE_SC3   00000000 DELL 00000001)
            ACPI: MSCT 0x000000006F7F3000 000090 (v01 DELL   PE_SC3   00000001 DELL 00000001)
            ACPI: PCAT 0x000000006F7F2000 000088 (v02 DELL   PE_SC3   00000002 DELL 00000001)
            ACPI: PCCT 0x000000006F7F1000 00006E (v01 DELL   PE_SC3   00000002 DELL 00000001)
            ACPI: RASF 0x000000006F7F0000 000030 (v01 DELL   PE_SC3   00000001 DELL 00000001)
            ACPI: SLIT 0x000000006F7EF000 00042C (v01 DELL   PE_SC3   00000001 DELL 00000001)
            ACPI: SRAT 0x000000006F7EC000 002D30 (v03 DELL   PE_SC3   00000002 DELL 00000001)
            ACPI: SVOS 0x000000006F7EB000 000032 (v01 DELL   PE_SC3   00000000 DELL 00000001)
            ACPI: WSMT 0x000000006F7EA000 000028 (v01 DELL   PE_SC3   00000000 DELL 00000001)
            ACPI: OEM4 0x000000006F459000 0AD1C1 (v02 INTEL  CPU  CST 00003000 INTL 20180508)
            ACPI: SSDT 0x000000006F421000 037465 (v02 INTEL  SSDT  PM 00004000 INTL 20180508)
            ACPI: SSDT 0x000000006F407000 000A1F (v02 DELL   PE_SC3   00000000 DELL 00000001)
            ACPI: SSDT 0x000000006F41D000 00357F (v02 INTEL  SpsNm    00000002 INTL 20180508)
            ACPI: SPCR 0x000000006F41C000 000050 (v02                 00000000      00000000)
            ACPI: DMAR 0x000000006F7FD000 000260 (v01 DELL   PE_SC3   00000001 DELL 00000001)
            ACPI: HEST 0x000000006F3F6000 00017C (v01 DELL   PE_SC3   00000002 DELL 00000001)
            ACPI: BERT 0x000000006F3F5000 000030 (v01 DELL   PE_SC3   00000002 DELL 00000001)
            ACPI: ERST 0x000000006F3F4000 000230 (v01 DELL   PE_SC3   00000002 DELL 00000001)
            ACPI: EINJ 0x000000006F3F3000 000150 (v01 DELL   PE_SC3   00000002 DELL 00000001)
            Setting APIC routing to cluster x2apic.
            Zone ranges:
              DMA      [mem 0x0000000000001000-0x0000000000ffffff]
              DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
              Normal   [mem 0x0000000100000000-0x000000183fffffff]
            Movable zone start for each node
            Early memory node ranges
              node   0: [mem 0x0000000000001000-0x000000000008bfff]
              node   0: [mem 0x0000000000100000-0x000000005ddfefff]
              node   0: [mem 0x000000006f7ff000-0x000000006f7fffff]
              node   0: [mem 0x0000000100000000-0x000000183fffffff]
            Reserved but unavailable: 117 pages
            Initmem setup node 0 [mem 0x0000000000001000-0x000000183fffffff]
            ACPI: PM-Timer IO Port: 0x508
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 8/0x4 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 9/0x24 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 10/0x18 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 11/0x38 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 12/0x10 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 13/0x30 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 14/0x16 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 15/0x36 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 16/0x12 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 17/0x32 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 18/0x14 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 19/0x34 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 20/0x1 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 21/0x21 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 22/0x9 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 23/0x29 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 24/0x3 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 25/0x23 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 26/0x7 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 27/0x27 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 28/0x5 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 29/0x25 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 30/0x19 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 31/0x39 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 32/0x11 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 33/0x31 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 34/0x17 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 35/0x37 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 36/0x13 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 37/0x33 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 38/0x15 ignored.
            APIC: NR_CPUS/possible_cpus limit of 8 reached. Processor 39/0x35 ignored.
            ACPI: X2APIC_NMI (uid[0xffffffff] high level lint[0x1])
            ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1])
            IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
            IOAPIC[1]: apic_id 9, version 32, address 0xfec01000, GSI 24-31
            IOAPIC[2]: apic_id 10, version 32, address 0xfec08000, GSI 32-39
            IOAPIC[3]: apic_id 11, version 32, address 0xfec10000, GSI 40-47
            IOAPIC[4]: apic_id 12, version 32, address 0xfec18000, GSI 48-55
            IOAPIC[5]: apic_id 15, version 32, address 0xfec20000, GSI 72-79
            IOAPIC[6]: apic_id 16, version 32, address 0xfec28000, GSI 80-87
            IOAPIC[7]: apic_id 17, version 32, address 0xfec30000, GSI 88-95
            IOAPIC[8]: apic_id 18, version 32, address 0xfec38000, GSI 96-103
            ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
            ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
            Using ACPI (MADT) for SMP configuration information
            ACPI: HPET id: 0x8086a701 base: 0xfed00000
            ACPI: SPCR: console: uart,io,0x2f8,115200
            smpboot: 40 Processors exceeds NR_CPUS limit of 8
            smpboot: Allowing 8 CPUs, 0 hotplug CPUs
            [mem 0x90000000-0xfcffffff] available for PCI devices
            Booting paravirtualized kernel on bare hardware
            clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
            random: get_random_bytes called from 0xffffffff82cafa32 with crng_init=0
            setup_percpu: NR_CPUS:8 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:1
            percpu: Embedded 41 pages/cpu s130840 r8192 d28904 u262144
            Built 1 zonelists, mobility grouping on.  Total pages: 24376830
            Kernel command line: loglevel=7 initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 web=http://10.8.128.2/fog/ consoleblank=0 rootfstype=ext4 console=tty0 console=ttyS1,115200 mac=e4:43:4b:7d:a9:ba ftp=10.8.128.2 storage=10.8.128.2:/opt/fog/images/dev/ storageip=10.8.120
            Misrouted IRQ fixup and polling support enabled
            This may significantly impact system performance
            Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes)
            Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes)
            Memory: 97287048K/99055148K available (16392K kernel code, 992K rwdata, 4548K rodata, 1056K init, 2416K bss, 1768100K reserved, 0K cma-reserved)
            SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
            Kernel/User page tables isolation: enabled
            rcu: Hierarchical RCU implementation.
            NR_IRQS: 4352, nr_irqs: 1848, preallocated irqs: 16
            Console: colour VGA+ 80x25
            console [tty0] enabled
            console [ttyS1] enabled
            ACPI: Core revision 20180810
            clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
            APIC: Switch to symmetric I/O mode setup
            x2apic: IRQ remapping doesn't support X2APIC mode
            x2apic disabled
            Switched APIC routing to flat.
            ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
            clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1fb633008a4, max_idle_ns: 440795292230 ns
            Calibrating delay loop (skipped), value calculated using timer frequency.. 4400.00 BogoMIPS (lpj=2200000)
            pid_max: default: 32768 minimum: 301
            Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes)
            Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes)
            ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
            ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
            process: using mwait in idle threads
            Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
            Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
            Spectre V2 : Mitigation: Full generic retpoline
            Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
            Spectre V2 : Enabling Restricted Speculation for firmware calls
            Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
            Spectre V2 : User space: Mitigation: STIBP via seccomp and prctl
            Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
            MDS: Mitigation: Clear CPU buffers
            Freeing SMP alternatives memory: 52K
            smpboot: CPU0: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz (family: 0x6, model: 0x55, stepping: 0x4)
            Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
            ... version:                4
            ... bit width:              48
            ... generic registers:      4
            ... value mask:             0000ffffffffffff
            ... max period:             00007fffffffffff
            ... fixed-purpose events:   3
            ... event mask:             000000070000000f
            rcu: Hierarchical SRCU implementation.
            smp: Bringing up secondary CPUs ...
            x86: Booting SMP configuration:
            .... node  #0, CPUs:      #1 #2 #3 #4 #5 #6 #7
            smp: Brought up 1 node, 8 CPUs
            smpboot: Max logical packages: 10
            smpboot: Total of 8 processors activated (35221.20 BogoMIPS)
            devtmpfs: initialized
            clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
            futex hash table entries: 2048 (order: 5, 131072 bytes)
            xor: automatically using best checksumming function   avx       
            pinctrl core: initialized pinctrl subsystem
            rcu: INFO: rcu_sched self-detected stall on CPU
            rcu:    0-....: (20999 ticks this GP) idle=03e/1/0x4000000000000002 softirq=10/10 fqs=5247 
            rcu:     (t=21000 jiffies g=-1175 q=18)
            NMI backtrace for cpu 0
            CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.64 #1
            Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.2.11 06/13/2019
            Call Trace:
             <IRQ>
             0xffffffff81d4c3d5
             0xffffffff81d4f95f
             ? 0xffffffff8102aa32
             0xffffffff81d4f9b8
             0xffffffff8107aafa
             0xffffffff8107a08b
             0xffffffff8107e1e6
             0xffffffff81087ecc
             0xffffffff81e01794
             0xffffffff81e0139f
             </IRQ>
            RIP: 0010:0xffffffff8108d4db
            Code: ee 89 c7 e8 40 ec cb 00 3b 05 45 63 86 01 73 1e 48 63 f0 49 8b 55 00 48 03 14 f5 00 53 62 82 8b 72 18 40 80 e6 01 74 04 f3 90 <eb> f3 eb d0 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 31 c9 85
            RSP: 0000:ffffc9000007fae8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
            RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001
            RDX: ffff8897e1063000 RSI: 0000000000000001 RDI: ffff8897e101fc48
            RBP: ffff8897e101fc48 R08: 00000000000000ff R09: ffff888000000000
            R10: ffffc9000007fb60 R11: 0000000000000001 R12: 000000000001fc00
            R13: ffff8897e101fc40 R14: 0000000000000000 R15: ffffffff82625300
             ? 0xffffffff8108d4b9
             ? 0xffffffff8103857d
             ? 0xffffffff8103857d
             0xffffffff8108d507
             0xffffffff8108d51b
             0xffffffff81035988
             0xffffffff81035a8e
             ? 0xffffffff810d853b
             ? 0xffffffff81cb3e73
             ? 0xffffffff81d4c10f
             ? 0xffffffff810cbe08
             0xffffffff81035cb8
             0xffffffff810366f2
             0xffffffff81095dbd
             0xffffffff81cb432c
             ? 0xffffffff82caf70b
             0xffffffff81cb43ea
             ? 0xffffffff82cee8ce
             0xffffffff82cef5c7
             0xffffffff82cee950
             0xffffffff8100040e
             ? 0xffffffff82caf70b
             0xffffffff82cafeed
             ? 0xffffffff81d5c631
             0xffffffff81d5c636
             0xffffffff81e00215
            

            WITH acpi=off and using ttyS1, it still hangs with no output to tty0 or ttyS1.

            george1421G 1 Reply Last reply Reply Quote 0
            • S
              Sebastian Roth Moderator
              last edited by

              @george1421 said in rcu_sched stall OR kernel panic on PowerEdge R640:

              I still have my kernel dev environment setup. What do we need to enable in the kernel for debugging?

              First enable CONFIG_EARLY_PRINTK and CONFIG_EARLY_PRINTK_EFI in the kernel config and edit arch/x86/boot/compressed/eboot.c and search for the function called efi_main. Add print statements like efi_printk(sys_table, "Text output\n"); at various places in that function to find out where exactly it locks up.

              Then when using the kernel the OP needs to add earlyprintk=efi (or earlyprintk=vga) to the kernel arguments.

              Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

              Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

              1 Reply Last reply Reply Quote 1
              • george1421G
                george1421 Moderator @djgalloway
                last edited by

                @djgalloway This is going to be a bit of a hunt and peck game here.

                Remove the apci=off command and lets have it use just one cpu by adding in nosmp as a kernel parameter. It looks like it crashes just after it brings up smp.

                Also just for clarity, from the printout this is the processor that is currently in use: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz (family: 0x6, model: 0x55, stepping: 0x4)

                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                D 1 Reply Last reply Reply Quote 0
                • D
                  djgalloway @george1421
                  last edited by

                  Kernel command line: loglevel=7 initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 web=http://10.8.128.2/fog/ consoleblank=0 rootfstype=ext4 console=tty0 console=ttyS1,115200 nosmp mac=e4:43:4b:7d:a9:ba ftp=10.8.128.2 storage=10.8.128.2:/opt/fog/images/dev/ storageip=1p
                  Misrouted IRQ fixup and polling support enabled
                  This may significantly impact system performance
                  Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes)
                  Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes)
                  Memory: 97288196K/99055148K available (16392K kernel code, 992K rwdata, 4548K rodata, 1056K init, 2416K bss, 1766952K reserved, 0K cma-reserved)
                  SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
                  Kernel/User page tables isolation: enabled
                  rcu: Hierarchical RCU implementation.
                  rcu:    RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
                  rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
                  NR_IRQS: 4352, nr_irqs: 32, preallocated irqs: 16
                  Console: colour VGA+ 80x25
                  console [tty0] enabled
                  console [ttyS1] enabled
                  ACPI: Core revision 20180810
                  ACPI: setting ELCR to 0200 (from 0820)
                  clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
                  APIC: SMP mode deactivated
                  APIC: Switch to symmetric I/O mode setup in no SMP routine
                  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
                  PGD 0 P4D 0 
                  Oops: 0002 [#1] SMP PTI
                  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.64 #1
                  Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.2.11 06/13/2019
                  RIP: 0010:0xffffffff8102d1e6
                  Code: c2 48 8b 14 d5 00 53 62 82 4a 8b 1c 22 48 85 db 74 d7 3b 2b 75 d3 eb 14 48 8b 1d 25 60 d8 01 48 c7 05 1a 60 d8 01 00 00 00 00 <89> 2b 65 48 89 1d 98 7c fe 7e 65 8b 05 39 1f fe 7e 89 c0 f0 48 0f
                  RSP: 0000:ffffffff82803e98 EFLAGS: 00010202
                  RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000040
                  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff828f36f0
                  RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff81408bd7
                  R10: 0000000000000000 R11: 000000000000005c R12: 0000000000014e88
                  R13: ffffffff82d460a0 R14: 0000000000000000 R15: 0000000000000000
                  FS:  0000000000000000(0000) GS:ffff8897e1000000(0000) knlGS:0000000000000000
                  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
                  CR2: 0000000000000000 CR3: 0000000002812001 CR4: 00000000000606b0
                  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
                  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
                  Call Trace:
                   0xffffffff81028ad4
                   0xffffffff82cc1652
                   0xffffffff82cb691a
                   0xffffffff82cafd33
                   0xffffffff810000d4
                  Modules linked in:
                  CR2: 0000000000000000
                  ---[ end trace f19259880c7c4bbb ]---
                  RIP: 0010:0xffffffff8102d1e6
                  Code: c2 48 8b 14 d5 00 53 62 82 4a 8b 1c 22 48 85 db 74 d7 3b 2b 75 d3 eb 14 48 8b 1d 25 60 d8 01 48 c7 05 1a 60 d8 01 00 00 00 00 <89> 2b 65 48 89 1d 98 7c fe 7e 65 8b 05 39 1f fe 7e 89 c0 f0 48 0f
                  RSP: 0000:ffffffff82803e98 EFLAGS: 00010202
                  RAX: 0000000000000008 RBX: 0000000000000000 RCX: 0000000000000040
                  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff828f36f0
                  RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff81408bd7
                  R10: 0000000000000000 R11: 000000000000005c R12: 0000000000014e88
                  R13: ffffffff82d460a0 R14: 0000000000000000 R15: 0000000000000000
                  FS:  0000000000000000(0000) GS:ffff8897e1000000(0000) knlGS:0000000000000000
                  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
                  CR2: 0000000000000000 CR3: 0000000002812001 CR4: 00000000000606b0
                  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
                  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
                  Kernel panic - not syncing: Attempted to kill the idle task!
                  ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
                  
                  1 Reply Last reply Reply Quote 0
                  • Q
                    Quazz Moderator
                    last edited by

                    We may have to enable kernel config option CONFIG_INTEL_IDLE to improve support for certain Intel CPUs.

                    We may also want to to bump up CONFIG_NR_CPUS from the default of 8 to 512 (common value on modern kernels) at least on the x64 config, though this one shouldn’t cause a crash.

                    That said, I am doubtful that would resolve this issue.

                    george1421G 1 Reply Last reply Reply Quote 0
                    • george1421G
                      george1421 Moderator @Quazz
                      last edited by

                      @Quazz said in rcu_sched stall OR kernel panic on PowerEdge R640:

                      We may also want to to bump up CONFIG_NR_CPUS from the default of 8 to 512

                      I’ve seen this setting in the kernel. I considered requesting the value set to 0 so it uses all available processors, but then I had to think this is for imaging and not a general purposes so having 28 cores available for imaging does really help because at most 4 threads (guess) would be used during imaging since most of the process is single threaded.

                      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                      Q 1 Reply Last reply Reply Quote 0
                      • Q
                        Quazz Moderator @george1421
                        last edited by

                        @george1421 Yes, I think that’s why it was left at 8 in the config, though perhaps some CPUs don’t handle a majority of their cores being ignored very well?

                        george1421G 1 Reply Last reply Reply Quote 1
                        • george1421G
                          george1421 Moderator @Quazz
                          last edited by

                          @Quazz That is surely something we can test.

                          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                          1 Reply Last reply Reply Quote 0
                          • D
                            djgalloway
                            last edited by

                            @george1421 are you working on building a kernel with @Quazz’s suggestions or should I? I don’t have experience building a kernel from scratch but I can probably figure it out.

                            george1421G 1 Reply Last reply Reply Quote 0
                            • george1421G
                              george1421 Moderator @djgalloway
                              last edited by

                              @djgalloway Sorry I got side tracked this AM. I almost had it built. Give me a few and I’ll send you a link to the kernel via IM chat.

                              Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                              1 Reply Last reply Reply Quote 0
                              • D
                                djgalloway
                                last edited by

                                Here’s the latest output using the debug kernel:

                                console [ttyS1] enabled
                                bootconsole [earlyvga0] disabled
                                ACPI: Core revision 20180810
                                clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
                                APIC: Switch to symmetric I/O mode setup
                                x2apic: IRQ remapping doesn't support X2APIC mode
                                x2apic disabled
                                Switched APIC routing to flat.
                                ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
                                clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1fb633008a4, max_idle_ns: 440795292230 ns
                                Calibrating delay loop (skipped), value calculated using timer frequency.. 4400.00 BogoMIPS (lpj=2200000)
                                pid_max: default: 32768 minimum: 301
                                Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes)
                                Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes)
                                ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
                                ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
                                process: using mwait in idle threads
                                Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
                                Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
                                Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
                                Spectre V2 : Mitigation: Full generic retpoline
                                Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
                                Spectre V2 : Enabling Restricted Speculation for firmware calls
                                Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
                                Spectre V2 : User space: Mitigation: STIBP via seccomp and prctl
                                Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
                                MDS: Mitigation: Clear CPU buffers
                                Freeing SMP alternatives memory: 52K
                                smpboot: CPU0: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz (family: 0x6, model: 0x55, stepping: 0x4)
                                Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
                                ... version:                4
                                ... bit width:              48
                                ... generic registers:      4
                                ... value mask:             0000ffffffffffff
                                ... max period:             00007fffffffffff
                                ... fixed-purpose events:   3
                                ... event mask:             000000070000000f
                                rcu: Hierarchical SRCU implementation.
                                smp: Bringing up secondary CPUs ...
                                x86: Booting SMP configuration:
                                .... node  #0, CPUs:      #1 #2 #3 #4 #5 #6 #7
                                smp: Brought up 1 node, 8 CPUs
                                smpboot: Max logical packages: 10
                                smpboot: Total of 8 processors activated (35220.85 BogoMIPS)
                                devtmpfs: initialized
                                clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
                                futex hash table entries: 2048 (order: 5, 131072 bytes)
                                xor: automatically using best checksumming function   avx       
                                pinctrl core: initialized pinctrl subsystem
                                rcu: INFO: rcu_sched self-detected stall on CPU
                                rcu:    0-....: (20999 ticks this GP) idle=04a/1/0x4000000000000002 softirq=10/10 fqs=5241 
                                rcu:     (t=21000 jiffies g=-1175 q=19)
                                NMI backtrace for cpu 0
                                CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.65 #12
                                Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.2.11 06/13/2019
                                Call Trace:
                                 <IRQ>
                                 0xffffffff81d6ecad
                                 0xffffffff81d7222f
                                 ? 0xffffffff8102b073
                                 0xffffffff81d7228a
                                 0xffffffff8107ce90
                                 0xffffffff8107c41d
                                 0xffffffff810806b4
                                 0xffffffff8108a34e
                                 0xffffffff81e017d5
                                 0xffffffff81e013af
                                 </IRQ>
                                RIP: 0010:0xffffffff8108fa1d
                                Code: 36 48 89 de 89 c7 e8 ca ef cd 00 3b 05 c0 13 86 01 73 24 48 63 f0 49 8b 16 48 03 14 f5 30 83 61 82 8b 72 18 40 80 e6 01 74 04 <f3> 90 eb f3 eb d1 0f 0b e9 72 fe ff ff 48 83 c4 10 5b 5d 41 5c 41
                                RSP: 0000:ffffc9000007fae0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
                                RAX: 0000000000000001 RBX: ffff8897e101fac8 RCX: 0000000000000001
                                RDX: ffff8897e10621c0 RSI: 0000000000000001 RDI: ffff8897e101fac8
                                RBP: 000000000001fa80 R08: 0000000000000000 R09: 00000000016daed4
                                R10: ffffc9000007fb58 R11: 000fffffffe00000 R12: 0000000000000001
                                R13: 0000000000000008 R14: ffff8897e101fac0 R15: 0000000000000000
                                 ? 0xffffffff81039a
                                
                                george1421G 1 Reply Last reply Reply Quote 0
                                • george1421G
                                  george1421 Moderator @djgalloway
                                  last edited by george1421

                                  Just for grins I had the OP boot a 486 kernel I built for another poster for a specific dedicated machine to image with FOG. That kernel gave a bit more details than the full system kernel .

                                  Checking if this processor honours the WP bit even in supervisor mode...Ok.
                                  SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
                                  rcu: Hierarchical RCU implementation.
                                  NR_IRQS: 2304, nr_irqs: 1848, preallocated irqs: 16
                                  Console: colour VGA+ 80x25
                                  console [tty0] enabled
                                  console [ttyS1] enabled
                                  ACPI: Core revision 20180810
                                  clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
                                  APIC: Switch to symmetric I/O mode setup
                                  Enabling APIC mode:  Flat.  Using 9 I/O APICs
                                  ------------[ cut here ]------------
                                  Kernel BUG at 0xc1028128 [verbose debug info unavailable]
                                  invalid opcode: 0000 [#1] SMP
                                  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.65 #2
                                  Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS 2.2.11 06/13/2019
                                  EIP: 0xc1028128
                                  

                                  It looks like the kernel is crashing at enabling apic mode or with the apic IO. The clock source hpet also is memorable for some reason.

                                  So the kernel is crashing at the same point. For reference the 486 compatible kernel is also “Linux version 4.19.65”

                                  acpi=ht acpi=oldboot acpi_osi=Linux

                                  noapic

                                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                  JunkhackerJ 1 Reply Last reply Reply Quote 0
                                  • JunkhackerJ
                                    Junkhacker Developer @george1421
                                    last edited by

                                    i was googling the problem a bit and i was curious, will it boot if you remove the raid card?
                                    just trying to understand the source of the panic.

                                    signature:
                                    Junkhacker
                                    We are here to help you. If you are unresponsive to our questions, don't expect us to be responsive to yours.

                                    george1421G 1 Reply Last reply Reply Quote 0
                                    • george1421G
                                      george1421 Moderator @Junkhacker
                                      last edited by george1421

                                      @Junkhacker @Sebastian-Roth

                                      I was able to get the OP going by doing this and that.

                                      We are not sure if it was this or that that got the kernel to boot. What I did was unlocked the max CPUs (that was capped at 😎 in the kernel and I also enabled almost all of the ACPI modules in the kernel. We also tried the acpi_osi=Linux kernel parameter.

                                      We ruled out the acpi_osi=Linux kernel parameter fixing the issue so it must be something I enabled in the kernel. Tomorrow AM I’m going to reset the kernel environment and only unlock the max CPUs. The OP is going to test that new kernel to see if it was unlocking the max cpu or it was the acpi modules I enabled.

                                      Either way I’ll report where we ended up and which kernel change fixed the issue. I have also seen other recent CPU stalls like this that was fixed by setting acpi=off so we may need to move what ever fixed the issue into the main kernel build because new hardware/cpus may require it.

                                      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                      george1421G 1 Reply Last reply Reply Quote 1
                                      • george1421G
                                        george1421 Moderator @george1421
                                        last edited by Sebastian Roth

                                        @developers Here’s the final update on this issue.

                                        I reset my kernel build environment and then created 2 new kernel builds. The first was to remove the imposed CPU limit on the linux kernel this kernel was called bzImageMaxCPU. I reset the kernel build environment and then went through the ACPI settings turning on what I turned on in the debug kernel. This kernel was called bzImageACPI.

                                        The OP tested both and the bzImageMaxCPU was the only kernel that booted on those Dell servers. So in the end @Quazz was right about the CPU not liking some of its cores disabled.

                                        So I would recommend that we add the following settings to the official kernel build

                                        CONFIG_INTEL_IDLE
                                        and
                                        Processor type and features —>

                                        Enable Maximum number of SMP Processors and NUMA Nodes

                                        We have seen a recent uptick in reports of rcu_sched stalls with kernel panics Maybe we are running into this issue more often as the core counts go up on these processors.

                                        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                        1 Reply Last reply Reply Quote 4
                                        • S
                                          Sebastian Roth Moderator
                                          last edited by

                                          @george1421 @Quazz @djgalloway Great work!!! Thanks to you all. I will add this in the next days!

                                          Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                          Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                          1 Reply Last reply Reply Quote 1
                                          • S
                                            Sebastian Roth Moderator
                                            last edited by

                                            @george1421 @Quazz I found a bit of time to look into this. Adding CONFIG_INTEL_IDLE should be just fine I think. But I am not exactly sure about adding CONFIG_MAXSMP (Enable Maximum number of SMP Processors and NUMA Nodes). Found this topic: https://www.xenomai.org/pipermail/xenomai/2018-July/039297.html

                                            Though I am not convinced this will actually cause trouble it’s still a bit risky. @Testers @Moderators. Would you be able to run a test kernel on several different client machines so we get a feeling of this being troublesome or not?

                                            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                            george1421G Q 2 Replies Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 1 / 3
                                            • First post
                                              Last post

                                            216

                                            Online

                                            12.0k

                                            Users

                                            17.3k

                                            Topics

                                            155.2k

                                            Posts
                                            Copyright © 2012-2024 FOG Project