FOG - boot to FOS - rcu_sched self-detected stall on CPU



  • Server
    • FOG Version: 3 RC21
    • OS: Debian Jessy
    Client
    • Lenovo E560
    • BIOS: Legacy
    Description

    Hello,

    When I try to boot to the Quick Host Registration on FOG, I see a screen with the following info:

    INFO: rcu_sched self-detected stall on CPU
    o0-…: (337893 ticks this GP) idle=439/14000000000001/0 softirq=119/146 fqs=83192
    o (t=336015 jiffies g=-274 c=-275 q=168)

    This info displays every so often with different numbers every time.

    bzImage Version: 4.8.1
    bzImage32 Version: 4.8.1

    it boots using the bzImage32

    edit: just updated to RC21, problem still exists
    edit2: changed the title, the issue is not only for the Quick reg, it’s the entire FOG OS that doesn’t boot


  • Moderator

    @abos_systemax said in FOG - boot to FOS - rcu_sched self-detected stall on CPU:

    I assumed that it used bzImage for UEFI because UEFI doesn’t really support x32?

    It does. There aren’t a lot of UEFI systems that are 32 bit, but they are out there.

    Are you using the boot files that come with fog and are you serving them from the FOG Server? Could these files be obsolete or are you shuffling things around? I only ask because the filename you give in your dhcpd.conf file is not where that file actually is located on a 1.3.0 RC fog server by default.

    At any rate, if you’re serving these boot files from a FOG server on 1.3.0 RC, the PXEClient:Arch:00006 filename should instead be: filename "i386-efi/ipxe.efi";

    Also, if you wanted, you could simply replace all of your classes in your dhcpd.conf file with these (standard FOG isc-dhcp classes):

        class "UEFI-32-1" {
            match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00006";
            filename "i386-efi/ipxe.efi";
        }
        class "UEFI-32-2" {
            match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00002";
            filename "i386-efi/ipxe.efi";
        }
        class "UEFI-64-1" {
            match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00007";
            filename "ipxe.efi";
        }
        class "UEFI-64-2" {
            match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00008";
            filename "ipxe.efi";
        }
        class "UEFI-64-3" {
            match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00009";
            filename "ipxe.efi";
        }
        class "Legacy" {
            match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00000";
            filename "undionly.kkpxe";
        }
    
        class "SURFACE-Pro-4" {
        match if substring(option vendor-class-identifier, 0, 32) = "PXEClient:Arch:00007:UNDI:003016";
        filename "ipxe7156.efi";
        }
    
        class "Apple-Intel-Netboot" {
            match if substring (option vendor-class-identifier, 0, 14) = "AAPLBSDPC/i386";
            option dhcp-parameter-request-list 1,3,17,43,60;
            if (option dhcp-message-type = 8) {
                option vendor-class-identifier "AAPLBSDPC";
                if (substring(option vendor-encapsulated-options, 0, 3) = 01:01:01) {
                    # BSDP List
                    option vendor-encapsulated-options 01:01:01:04:02:80:00:07:04:81:00:05:2a:09:0D:81:00:05:2a:08:69:50:58:45:2d:46:4f:47;
                }
                elsif (substring(option vendor-encapsulated-options, 0, 3) = 01:01:02) {
                    # BSDP Select
                    option vendor-encapsulated-options 01:01:02:08:04:81:00:05:2a:82:0a:4e:65:74:42:6f:6f:74:30:30:31;
                    filename "ipxe.efi";
                    next-server x.x.x.x;
                }
        }
    


  • The issue only appears on known hosts apparently. We have no issues on hosts that fog doesn’t know somehow.
    I’m still trying to investigate why the arch isn’t parsed correctly on this device… will post as soon as I find an answer


  • Senior Developer

    @abos_systemax can you get a pic or video of the screen displaying as it’s booting through pxe? Basically start up through to trying to load into fog.



  • @Tom-Elliott I will get back to you on that, Now that i Have a workaround I first need to finish the current job before I can test any further

    edit
    == Yes, it changes both accordingly


  • Senior Developer

    @abos_systemax When it shows the loading of the init, is it also changing the number?

    bzImage32 with init_32.xz

    bzImage with init.xz



  • @abos_systemax

    This is my DHCP config:

    option space PXE;
    option PXE.mtftp-ip    code 1 = ip-address;
    option PXE.mtftp-cport code 2 = unsigned integer 16;
    option PXE.mtftp-sport code 3 = unsigned integer 16;
    option PXE.mtftp-tmout code 4 = unsigned integer 8;
    option PXE.mtftp-delay code 5 = unsigned integer 8;
    option arch code 93 = unsigned integer 16; # RFC4578
    
    authoritative;
    allow unknown-clients;
    option broadcast-address 192.168.71.255;
    option subnet-mask 255.255.252.0;
    option routers 192.168.68.1;
    ddns-update-style none;
    option domain-name "local";
    option domain-name-servers 192.168.68.11, 8.8.8.8;
    default-lease-time 600;
    max-lease-time 7200;
    log-facility local7;
    
    # LAN
    subnet 192.168.68.0 netmask 255.255.252.0 {
    	max-lease-time 14400;
    	default-lease-time 14400;
    	allow unknown-clients;
    	next-server 192.168.68.13;
    	range 192.168.68.30 192.168.71.200;
    	}
    
    class "pxeclient" {
        match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
    
        if substring (option vendor-class-identifier, 15, 5) = "00000" {
            # BIOS client 
            filename "undionly.kpxe";
        }
        elsif substring (option vendor-class-identifier, 15, 5) = "00006" {
            # EFI client 32 bit
            filename   "ipxe32.efi";
        }
        else {
            # default to EFI 64 bit
            filename   "ipxe.efi";
        }
       }
    

    The config states the following:

    Chip: undionly
    filename: undionly.kpxe
    buildarch: i386
    platform: pcbios



  • @Tom-Elliott I don’t know why it boots to bz32… it does apparently…
    I assumed that it used bzImage for UEFI because UEFI doesn’t really support x32?

    however, I retrieved the boot command from the boot.php and forced the boot.php to present it to this machine with bzImage instead of bzImage32 and now it apparently seems to work…

    Somehow it doesn’t give it’s archtype correctly?

    ===

    If I drop to Shelland check the buildarch, then it displays i386.
    Could this be my DHCP server that is confused?


  • Senior Developer

    @abos_systemax Erm, why is it booting bzImage32 for legacy but bzImage for uefi?

    32 is only for 32 bit requests.



  • @Tom-Elliott the device displays the INFOrmational; when in Legacy (and booting bzImage32), but displays the 0x7f048283 when in UEFI
    If i force the bzImage on legacy then … =it works=


  • Senior Developer

    @abos_systemax Loglevel is independent of the “Kernel Debug”

    Kernel Debug will turn on all debug messages. Loglevel will automatically be set to full for the kernel debug regardless of what you set the loglevel to.

    I doubt it’s firmware related. The messages, in and of themselves is fine.

    Is this booting bzImage or bzImage32?

    Is the system in UEFI or Legacy?



  • @Tom-Elliott

    as a matter of debugging, I tried booting to UEFI…
    Then I receive the iPXE error 0x7f048283, which is - funnily enough - the same error I had yesterday on a Lenovo M700 which wás able to boot on Legacy



  • I can also confirm that other brands of Linux are able too boot.



  • @Tom-Elliott There isn’t a firmware update available for these machines either. It’s on 1.22 and Lenovo’s latest release is 1.22



  • @Tom-Elliott setting it to 0 gives me the blinking cursor, 1 as well
    bumping it up to Debug and log level 7, logs until PCI device initialisation before the RCU informationals appear and I then receive a task dump for CPU 0

    bumping it back down to level 4 doesn’t show less messages, so apparently Log level 7 is the same as log level 4 with Kernel_debug?

    btw there is a small typo in the helptext for loglevel (the instead of they)


  • Senior Developer

    @abos_systemax If you re-up to the current kernel and set the log level down (from FOG Configuration->FOG Settings->FOG Boot Settings->FOG_KERNEL_LOGLEVEL) I imagine you will see less of these messages?



  • @Tom-Elliott
    ok, so as soon as I disable Hyper-treading (which makes ipxe monstrously slow btw), I still receive the Informational error, but immediatly after that I receive an rcu_sched kthread starved for xxxxxx jiffies! (where xxxx is a number)
    _RCU_GP_WAIT_FQS(3) -> state=0x1 (and the following message is state=0x0)



  • @Tom-Elliott both 4.6.4 and 4.5.0 give the same error as the 4.8.1 bzImage32s

    e1: i even went as far as downgrading to 3.0.1, but that one results in a blinking cursor on the top left
    e2: as does 4.1.0

    e3: 4.2.0 is the first kernel to display the rcu_sched error apparently


  • Senior Developer

    Can you try changing the kernel out for one of the 4.6 kernels? 4.7 I added a bunch more “stuff” which may have included RCU_Scheduling. 4.6, I’m pretty certain, did not have the “extras” and may help you out here.



  • @Tom-Elliott it does appear to be model related because the other device also isn’t able to boot (with the same rcu_sched messages)


Log in to reply
 

425
Online

39.3k
Users

11.0k
Topics

104.6k
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.