Tablet PC hangs on bzImage



  • @Sebastian-Roth When I run the command I get the following

    # tcpdump -w hang.pcap host 153.86.19.24
    tcpdump: NFLOG link-layer type filtering not implemented
    

  • Developer

    @Zerpie Ok, well that’s interesting. So it might not even load the full kernel image but hang on the network connection. So let’s tackle this from a different angle. Install package tcpdump and run:

    sudo tcpdump -w hang.pcap host x.x.x.x
    

    Put in the client’s IP address instead of x.x.x.x and let the command just sit there. Now boot up the client and wait till it hangs. Wait another 10 seconds and then stop tcpdump (ctrl+c). Upload the generated hang.pcap file (will be around 30 MB or more) and post a download link here in the forums (or send me a PM).



  • @Sebastian-Roth It’s still hanging on BzImage_debug… and it will usually get to a percentage and stop. Just now it stopped at

    BzImage_debug... 15%
    

    No “Hello World” message.


  • Developer

    @Zerpie Ok, now we get into the code an add our own debug statements. Edit linux-x.y/arch/x86/boot/compressed/eboot.c and jump to where function efi_main is defined, should look like this or pretty similar depending on the kernel version:

    /*
     * On success we return a pointer to a boot_params structure, and NULL
     * on failure.
     */
    struct boot_params *efi_main(struct efi_config *c,
                                 struct boot_params *boot_params)
    {
            struct desc_ptr *gdt = NULL;
            efi_loaded_image_t *image;
    ...
            sys_table = _table;
            efi_printk(sys_table, "Hello World!\n"); 
    
            /* Check if we were booted by the EFI firmware */
            if (sys_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
    ...
    

    That efi_printk call is not part of the original code. That’s what I ask you to add, just that one line. Then compile once again, copy the new kernel over and try booting that same client where you have set the debug kernel image and parameters in host’s settings.



  • @Sebastian-Roth Now it just hangs on

    BzImage_debug...
    

  • Developer

    @Zerpie Good to hear. Now give it a first try by copying it to the right location and configure this new one as “Host Kernel” in one of your hosts’ settings. This way you can simply test with one or a few clients without causing an issue for other clients.

    cp build/linux-4.x.y/arch/x86/boot/bzImage /var/www/html/fog/service/ipxe/bzImage_debug
    

    Only clients having set bzImage_debug as “Host Kernel” will receive this kernel image on boot.
    If you got this to work you can go back to your kernel source and modify the kernel config (build/linux-4.x.y/.config). Find CONFIG_EARLY_PRINTK and chnage from # CONFIG_EARLY_PRINTK is not set to CONFIG_EARLY_PRINTK=y (make sure you remove the hash tag at the beginning of the line too). Save the file and run:

    make oldconfig
    make bzImage
    cp arch/x86/boot/bzImage /var/www/html/fog/service/ipxe/bzImage_debug
    

    Build won’t take very long this time!

    Now where you set the “Host Kernel” for one of the clients you need to add “Host Kernel Arguments” as well: debug earlyprintk=efi loglevel=7

    Keeping my fingers crossed that we see some early kernel messages on screen then. If not there is still more we can do.



  • @Sebastian-Roth Thanks. To be fair, I should have been able to figure that out, myself, but I’m still getting comfortable with Linux.

    Alright, I built the kernel. What is the next step?

    Thanks again for your help.


  • Developer

    @Zerpie My fault, missing developer package…

    debian/ubuntu# sudo apt-get install bison
    fedora/centos# sudo yum install bison
    


  • @Sebastian-Roth said in Tablet PC hangs on bzImage:

    make oldconfig

    I got as far as this command but when I run the command I get the following.

    # make oldconfig
      HOSTCC  scripts/basic/fixdep
      HOSTCC  scripts/kconfig/conf.o
      YACC    scripts/kconfig/zconf.tab.c
    /bin/sh: bison: command not found
    make[1]: *** [scripts/kconfig/zconf.tab.c] Error 127
    make: *** [oldconfig] Error 2
    

  • Developer

    @Zerpie said in Tablet PC hangs on bzImage:

    I’m hoping Fog is going to be the solution that works for us.

    First I have to say that there is hope but I can’t promise you a full solution. I remember one time when we were debugging such an issue and it turned out the firmware was buggy. We boiled it down to the bits but the firmware company never officially released a fixed version. So we had to live with a patch in the kernel to work around that firmware bug.

    Anyhow, let’s get started. Make sure you have sufficient space on your FOG server to install developer tools and extract a couple of source code packages - 10 GB should be heaps.

    Install developer tools (commands from our wiki, have not tried those myself lately):

    debian/ubuntu# sudo apt-get install git build-essential zlib1g-dev binutils-dev
    fedora/centos# sudo yum install git gcc gcc-c++ make zlib-devel binutils-devel
    

    See which kernel you are running on your FOG server right now. Not the host OS kernel but the one delivered to the clients. Depending on your OS you can use the file command for that: file /var/www/html/fog/service/ipxe/bzImage*

    Download that same kernel version (put in the version number instead of linux-4.x.y below) and start building:

    mkdir build
    cd build
    wget https://mirrors.edge.kernel.org/pub/linux/kernel/v4.x/linux-4.x.y.tar.xz
    tar xJf linux-4.x.y.tar.xz
    cd linux-4.x.y
    git clone git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
    wget -O .config https://github.com/FOGProject/fos/raw/master/configs/kernelx64.config
    make oldconfig
    make bzImage
    

    Now this may take half an hour, possibly longer depending on your processor and RAM.

    See if you can build that 64 bit kernel properly. If not, post the exact error message here.



  • @Sebastian-Roth Sure! I’m not as familiar with Linux as I’d like to be so bear with me if I have a lot of questions or make some mistakes. But I’m up for doing whatever I can to figure this out. Our office has had a lot of trouble finding a solution to easily image these tablets so that we can quickly get them out to the customers. I’m hoping Fog is going to be the solution that works for us.


  • Developer

    @Zerpie said in Tablet PC hangs on bzImage:

    ... Exec format error ...

    From my experience I reckon this could mean that it cannot execute the 32 bit kernel binary. This can be caused by different reasons. I’d go back to bzImage and see if we can figure out why it hangs. Are you keen to build you own debug enabled binaries if I point you on how to do this?



  • Alright I got my equipment to PXE boot again. I had to undo all the changes I made from the BIOS and UEFI Co-Existance wiki at https://wiki.fogproject.org/wiki/index.php/BIOS_and_UEFI_Co-Existence

    Apparently my Fog server didn’t like those changes.

    I tried imaging the tablet after entering BzImage32 in the Host Kernel on the web interface and it made it past the BzImage portion but then immediately failed with the following.

    BzImage32... ok
    Could not select: Exec format error (http://ipxe.org/2e008081)
    Could not boot: Exec format error (http://ipxe.org/2e008081)
    Could not boot: Exec format error (http://ipxe.org/2e008081)
    


  • @Sebastian-Roth No, not yet. I’m still having problems PXE booting at all now. I keep thinking I’ve found the issue, but it’s still not PXE booting for any of my equipment. I don’t know what changed. Everything was working fine on Friday. Nobody touched anything over the holiday weekend, and then Tuesday it was no longer PXE booting. I can’t figure out why. Once I’ve got that working again I can test the tablet.


  • Developer

    @Zerpie So were you able to image that tablet now?



  • One of the network adapters was disabled. I’ve re-enabled it.



  • Hmm. Even after shutting down that other server I’m not able to PXE boot any machine into Fog.

    I rebooted the Fog server a couple of times, but there’s still no change.

    What can I do to check that all the services needed for PXE boot are running on the server? I’m using the FOG’s DHCP server BTW.



  • @Zerpie Ignore that last post. The tablet was connecting to another DHCP server on the network that wasn’t supposed to be running at the time.



  • @george1421 I’ve been trying to do just that but when I try to PXE boot the tablet now it’s failing to grab an IP address from the Fog server DHCP. This doesn’t seem to have anything to do with adding BzImage32 to the host. It’s doing the same regardless. Is there something I can do in the Fog server to help it assign an IP to this? Maybe it’s not letting go of it’s current lease.


  • Moderator

    We just had another FOG admin with the same issue. In his case he had a specialty tablet that had a 64 bit atom processor, but the tablet manufacturer locked the CPU in 32 bit mode. This contradiction caused the iPXE kernel to misidentify the processor architecture and inform FOG that the target computer was a 64 bit computer, when it really was not.

    OK here is how to test to see if your hardware is in a similar state. For the target computer

    1. Manually register the computer.
    2. In the Host definition for this computer set the Host Kernel to bzImage32
    3. Save the host definition and then attempt to pxe boot the computer. You should now be able to get past the iPXE boot menu

Log in to reply
 

371
Online

6.2k
Users

13.5k
Topics

127.5k
Posts