Client hangs at EFI stub:
-
@george1421 I am in the process of getting setup to do the preboot stuff I will try getting the vmlinuz from ubuntu and try as a bzImage on the usb drive.
-
@george1421 When I used the ubuntu kernel as the bzImage it started to boot but I get a kernel panic. That is a good sign though.
-
@sgilbe I realize this debugging is a lot of try this and do that, but we’ve narrowed it down to exactly the FOS Linux kernel. The ubuntu kernel error bombed out exactly where I expected it to do, at mounting the virtual hard drive (init.xz).
I think you asked earlier about the config file fog uses these are all posted on the fog project github site: https://github.com/FOGProject/fos/tree/master/configs The config file you are looking for is kernelx64.config this is the config file used to create the current kernel.
-
@sgilbe It looks like my build environment was really out of date. The last time I needed to create a one off kernel was for version 5.15.x. More to the point I updated the build environment to 6.5.3 and built this kernel. https://drive.google.com/file/d/1P-OX1LXhm-N_oBLg0PVcIjj0P3Cxm_Rp/view?usp=drive_link download this kernel and save it onto your flash drive as bzImage. I don’t expect it to work any better than the FOS standard kernel, but I want to see if the new kernel release works on your processor. If this kernel doesn’t work better than the stock FOS kernel then I compare what ubuntu is creating with this config file to see what is missing. I do have kernel options to turn on the advanced features of the Intel scalable processor, but the base x64 kernel should also run on this processor (IMO).
Also please confirm that you updated all of the firmware on the server using the lifecycle controller.
-
@george1421 This kernel is hanging at the same place as the fos kernel.
-
@sgilbe said in Client hangs at EFI stub::
This kernel is hanging at the same place as the fos kernel
In a way that’s good because it should be the same exact kernel except for 6.5.3 version instead of 6.2.x. So now the next part is I need to compare ubuntu kernel settings with FOS linux kernel settings.
-
@george1421 @sgilbe Great you guys have narrowed this down to this point. I have followed this thread but have not had the time to help you out and won’t be of much help in the next two weeks with just a little more time but no computer at hand but only the mobile phone to post and do research. Anyhow, I will try my best.
Yes it could be as “simple” as switching a kernel config setting on or off but I rather think the Ubuntu kernel might boot on the machine due to some patch Ubuntu applies to it’s kernels. That would make it a bit harder but not impossible to figure out and fix in FOS.
@sgilbe On a system with Ubuntu installed you should find the kernel config in
/boot/config-*
. -
@Sebastian-Roth said in Client hangs at EFI stub::
Yes it could be as “simple” as switching a kernel config setting on or off but I rather think the Ubuntu kernel might boot on the machine due to some patch Ubuntu applies to it’s kernels. That would make it a bit harder but not impossible to figure out and fix in FOS.
The more I read the more I think I was wrong with assuming Ubuntu patches making it work. The Wikipedia article on this CPU says:
Not all accelerators are available in all processor models. Some accelerators are available under the Intel On Demand program, also known as Software Defined Silicon (SDSi), where a license is required to activate a given accelerator that is physically present in the processor. The license can be obtained as a one-time purchase or as a paid subscription. Activating the license requires support in the operating system. A driver with the necessary support was added in Linux 6.2.
So it’s not likely you still need special patches to even just boot up.
-
@Sebastian-Roth I did do a side by side comparison between ubuntu configs and FOS linux configs and there are roughly 1800 differences. Many were in drivers and options. The only one that stood out in the efi section was
CONFIG_EFI_MIXED
which allowed a 32 bit EFI kernel boot a 64 bit linux kernel. Seems kind of strange, but we probably should turn that on.Though a second process I started with an ia64 defconfig template and then added in the FOS required settings leaving almost all of the defconfig settings in place but adding in the fog required settings. I built this last night but haven’t had time to see if it boots. I did not add in the old ISA card network drivers or network adapters that I’m pretty sure are not in circulation like DEC Tulip network drivers. That kernel came in at 15MB as compared to the FOS kernel of 10MB. I’m not really worried about that extra 5MB kernel in size in 2023. This kernel is based on linux 6.5.3.
The other thing I need to point out is the the OPs platform is a server with an intel scalable processor. I don’t know what other hardware might be getting in the way. The FOS kernel should at least try to boot, it might not boot completely but should at least try to boot. We are not seeing that. By building the FOS usb boot drive we have eliminated all of the pxe and ipxe issues so we’ve narrowed it down to the FOS kernel, and swapping in the ubuntu kernel points directly to the FOS kernel at fault.
I hadn’t considered a ubuntu kernel patch to be the solution here either. I used linux 6.5.3 thinking that it should have all of the mainstream patches already in it.
-
@george1421 Thank you all for your continued support in trying to figure out why this is not working. I appreciate all the work and time you have put into this for me and I have learned a lot so far about how this all works.
-
@sgilbe Do you still have access to this server?
-
@george1421 Yes I do
-
@sgilbe Well my first attempt to rebuild the kernel gave me the same results as you. Not what I expected so I need to work a bit more. If I can get something that boots in the next day or so, are you willing to test to see if it resolves your booting issue?
-
@george1421 I am more than willing to test any kernel that you have for me to try. If you need to build several to test different configurations I can work with that as well. It does not take much time to be able to switch between kernels with the USB drive. I have a CentOS on the system and can map my share to be able to just grab and add the kernels as needed.
-
@george1421 any update on a new kernel to be able to try out? Just checking in.
-
@george1421 What are the required FOS settings. I have been trying to build a FOS bzImage but with no success on getting it to boot yet. Would I need a new init.zx if I move to a newer kernel?
-
@sgilbe I haven’t found the right combination to start with a clean kernel and just to get it to run on a standard system. But I do have to admit I haven’t had a lot of extra time lately to work on this.
As for needing a new init.xz. Its not at that point yet. The kernel boots and inits the hardware then connects to the init.xz to startup linux. The issue is within the kernel at this point. It may be as Sebastian mentioned that there was a patch that ubuntu added to make the kernel boot. I’m not at a give up point, but there has to be a solution here.
-
@sgilbe @george1421 I think I have an idea of what kernel modules need to be enabled for this type of CPU. I don’t have my dev laptop with me now, but I’ll work on it tonight/tomorrow so you can try it out.
-
@rodluz If you have an idea, I’m interested since I can’t seem to get the FOS kernel to boot on this hardware, and without having the hardware in hand its difficult to debug the issue too.
The FOS Linux original kernel configuration to start with is here: https://github.com/FOGProject/fos/blob/master/configs/kernelx64.config
-
@sgilbe Try this kernel out. https://drive.google.com/drive/folders/1sP6dfRymYaFTCr8iRiK64hN2pp2X836n?usp=sharing
This is kernel 6.5.6 with some config changes specific for gen 3/4 scalable Xeon CPUs. Please let us know if this works so I can document the changes.
Something else to look at… I had an issue like this with another Linux system last week. The issue turned out to be a Mellanox 40G PCIe card not playing nicely with the Kernel. Have you tried taking out non-essential PCIe cards from the host to test?