• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Hand-off to FOS kernel fails on certain Gen4 Xeon (Sapphire Rapids) based systems - Dell R760, Supermicro X13, etc

    Scheduled Pinned Locked Moved
    Hardware Compatibility
    3
    11
    1.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      asawtell
      last edited by asawtell

      I work in a testing lab where we use FOG to deploy operating systems. We have a very wide range of server platforms, from a few generations ago up to current and even a few pre-release platforms.

      Generally FOG works well on all of our platforms and PXE boots with no issue. We are currently running FOG 1.5.9 to be compatible with our in-house automation, but we have the latest FOS kernel release loaded:

      file /var/www/html/fog/service/ipxe/bzImage*
      /var/www/html/fog/service/ipxe/bzImage:   Linux kernel x86 boot executable bzImage, version 6.1.22 (runner@fv-az565-7) #1 SMP PREEMPT_DYNAMIC Fri Mar 31 00:29:42 UTC 2023, RO-rootFS, swap_dev 0x9, Normal VGA
      /var/www/html/fog/service/ipxe/bzImage32: Linux kernel x86 boot executable bzImage, version 6.1.22 (runner@fv-az576-383) #1 SMP PREEMPT_DYNAMIC Fri Mar 31 00:26:56 UTC 2023, RO-rootFS, swap_dev 0x8, Normal VGA
      

      The issue comes with some of our newest platforms, listed here:

      Platform #1: Dell R760 server, showed issues out of the box
      Platform #2: Supermicro 4U server with X13DEG-OA motherboard, worked out of the box, stopped working with latest BIOS loaded.
      Platform #3: Pre-release Gen4 Xeon based system, worked out of the box, stopped working with latest BIOS loaded.

      They all happen to be using 4th generation Xeon Scalable processors, but I’m not sure that is the specific problem. When attempting to boot them via PXE, the system halts at the hand-off point to the FOS kernel until manually rebooted. The last visible message before hang is “EFI stub: Loaded initrd from command line option”.

      This seems to be an issue with the FOS kernel rather than the iPXE chain leading up to it, since creating a USB boot disk and chainloading into the FOS kernel from GRUB also locks up the system in the same way. I’m a newbie where this sort of thing is concerned but judging by the little bit of diagnostic output I was able to get from GRUB, it looks like the failure is instant as soon as FOS takes over, and doesn’t happen anywhere while GRUB (or iPXE) is still running.

      The issue also appears to be related to BIOS somehow - two of our three platforms initially worked fine with FOS and FOG deployment, but started exhibiting the issue after an update to the latest BIOS. Secure Boot is disabled on all platforms.

      At this point we’ve been trying to root cause and fix this issue for a couple weeks and haven’t made any progress. Any suggestions for what we could try to resolve this or ways we could generate additional diagnostics information for you folks would be much appreciated. We have some quite capable engineers over here but nobody has much experience with low-level firmware or kernel stuff that it seems like we might be dealing with.

      george1421G 1 Reply Last reply Reply Quote 0
      • george1421G
        george1421 Moderator @asawtell
        last edited by

        @asawtell Sorry for the late viewing, is this still an issue.

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

        A 1 Reply Last reply Reply Quote 0
        • A
          asawtell @george1421
          last edited by

          @george1421 Sorry for the late reply - haven’t been checking this lately due to the lack of response.

          Yes, this is still an issue. We have been working around it with manual installs and disk swapping but haven’t made any headway on fixing the problem. We reached out to the system vendors as well for advice but most of their engineers are busy with launch cycles and aren’t supporting us at the moment.

          george1421G 1 Reply Last reply Reply Quote 0
          • george1421G
            george1421 Moderator @asawtell
            last edited by

            @asawtell well one way to give a bit more information if you go into FOG Configuration->FOG Settings and hit the expand all button. Then search for log set the log level to 7.

            Also upgrade to the latest version of kernel 6.x. There has to be a bit more info here on why its failing. I know for the servers we may need to create a custom kernel to have drivers for the raid controllers. But if its not booting into FOS then we haven’t got to the drivers yet.

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

            A 1 Reply Last reply Reply Quote 0
            • A
              asawtell @george1421
              last edited by asawtell

              @george1421 I’m not quite sure what you mean by “upgrade to the latest version of kernel 6.x”. Is there a newer version of FOS kernel than the release from Mar 30 at https://github.com/FOGProject/fos/releases? Or do you mean rebuilding FOS with a newer kernel 6.x included?

              I will try changing log level to 7 and monitoring the logs while reproducing the issue, but I don’t think there’s any opportunity for the server to submit information back after it locks up. So I’m not sure if we will see anything valuable.

              george1421G 1 Reply Last reply Reply Quote 0
              • george1421G
                george1421 Moderator @asawtell
                last edited by

                @asawtell You are on the latest kernel at the moment. But I was referring to FOG Configuration->Kernel update menu to get the latest version if you need it.

                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                A 1 Reply Last reply Reply Quote 0
                • A
                  asawtell @george1421
                  last edited by

                  @george1421 Understood. We haven’t been able to get the automated update to work behind our corporate proxy, so it’s been manual updates - glad to hear that we are on the latest kernel.

                  I just set the kernel loglevel to 7 and tried a PXE boot on one of our problematic systems. It’s still falling over instantaneously as soon as it hits the FOS kernel, with no additional messaging provided. Is there some sort of logging I can turn on in iPXE which might catch something?

                  failure w loglevel 7.png

                  When I boot to one of our normally functioning systems (did this with a debug task) I can see that the loglevel is being correctly set and /var/log/messages contains the expected low level kernel log output.

                  OK boot w loglevel 7.png

                  Another step I might try is trying to find a distro which uses kernel 6.1.22 and seeing whether it also fails boot or if the issue is FOS exclusive. Will do some poking around and see what I can find.

                  george1421G 1 Reply Last reply Reply Quote 0
                  • george1421G
                    george1421 Moderator @asawtell
                    last edited by george1421

                    @asawtell OK lets see if we can usb boot into FOS Linux. It looks like there might be a problem with iPXE not handing off to the OS cleanly.

                    But first, lets see if its ipxe causing this problem. Lets update to the latest version of iPXE using this process. This is all done on the fog server. https://forums.fogproject.org/topic/15826/updating-compiling-the-latest-version-of-ipxe?_=1692123396261

                    If that process fails we will use this tutorial to create a usb flash drive to direct boot into FOS linux. Look over this tutorial completely to understand the caveats with this route. https://forums.fogproject.org/topic/7727/building-usb-booting-fos-image Look at the FOG Forum chat for some additional tips.

                    The idea of usb booting into FOS Linux will be to boot into debug mode so we can try to find out what hardware is missing from the kernel build. The default kernel build is targeted to workstation class computers not servers, we may need to make a customer (one off) kernel that has raid drivers or specialty network drivers.

                    Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                    A 1 Reply Last reply Reply Quote 0
                    • A
                      asawtell @george1421
                      last edited by

                      @george1421 Thanks George. I think I mentioned in my first post - I already replicated the issue with a USB/GRUB boot disk, where it hung at the same point of handing over to the FOS kernel from GRUB. I believe I used the same method for creating the boot disk, but I could build a brand new one and re-run the test just to confirm.

                      I’ll also see if we can safely update iPXE without breaking anything in our environment and report back if it changes anything.

                      george1421G 1 Reply Last reply Reply Quote 0
                      • george1421G
                        george1421 Moderator @asawtell
                        last edited by

                        @asawtell said in Hand-off to FOS kernel fails on certain Gen4 Xeon (Sapphire Rapids) based systems - Dell R760, Supermicro X13, etc:

                        I’ll also see if we can safely update iPXE without breaking anything in our environment and report back if it changes anything.

                        Upgrade iPXE will only impact pxe booting if something goes really bad. You can fix it by just recloning the installer files and reinstalling fog.

                        So this means that if you used the usb fos boot and its still failing to start up linux there is something fundamentally wrong with the FOS kernel. I would think that xscale processors are x64 compatible, so they should boot the fos linux kernel.

                        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                        1 Reply Last reply Reply Quote 0
                        • S
                          Sebastian Roth Moderator
                          last edited by

                          @asawtell @george1421 Great to see you have been working on this while I was absent. So far to me it looks like a specific issue on this hardware with the Linux kernel.

                          Would be great if you can boot some kind of live Linux OS, or maybe just an installer ISO will do as well.

                          The other option is adding additional print output statements to the kernel source code and compile a custom kernel to boot and see exactly where it hangs. I can guide you on this if you are keen to give it a go, let me know.

                          Also you might see if there is an UEFI firmware update available (not sure if you talked about this already).

                          Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                          Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                          1 Reply Last reply Reply Quote 1
                          • george1421G george1421 referenced this topic on
                          • 1 / 1
                          • First post
                            Last post

                          160

                          Online

                          12.0k

                          Users

                          17.3k

                          Topics

                          155.2k

                          Posts
                          Copyright © 2012-2024 FOG Project