Client hangs at EFI stub:
-
@Sebastian-Roth said in Client hangs at EFI stub::
Yes it could be as “simple” as switching a kernel config setting on or off but I rather think the Ubuntu kernel might boot on the machine due to some patch Ubuntu applies to it’s kernels. That would make it a bit harder but not impossible to figure out and fix in FOS.
The more I read the more I think I was wrong with assuming Ubuntu patches making it work. The Wikipedia article on this CPU says:
Not all accelerators are available in all processor models. Some accelerators are available under the Intel On Demand program, also known as Software Defined Silicon (SDSi), where a license is required to activate a given accelerator that is physically present in the processor. The license can be obtained as a one-time purchase or as a paid subscription. Activating the license requires support in the operating system. A driver with the necessary support was added in Linux 6.2.
So it’s not likely you still need special patches to even just boot up.
-
@Sebastian-Roth I did do a side by side comparison between ubuntu configs and FOS linux configs and there are roughly 1800 differences. Many were in drivers and options. The only one that stood out in the efi section was
CONFIG_EFI_MIXED
which allowed a 32 bit EFI kernel boot a 64 bit linux kernel. Seems kind of strange, but we probably should turn that on.Though a second process I started with an ia64 defconfig template and then added in the FOS required settings leaving almost all of the defconfig settings in place but adding in the fog required settings. I built this last night but haven’t had time to see if it boots. I did not add in the old ISA card network drivers or network adapters that I’m pretty sure are not in circulation like DEC Tulip network drivers. That kernel came in at 15MB as compared to the FOS kernel of 10MB. I’m not really worried about that extra 5MB kernel in size in 2023. This kernel is based on linux 6.5.3.
The other thing I need to point out is the the OPs platform is a server with an intel scalable processor. I don’t know what other hardware might be getting in the way. The FOS kernel should at least try to boot, it might not boot completely but should at least try to boot. We are not seeing that. By building the FOS usb boot drive we have eliminated all of the pxe and ipxe issues so we’ve narrowed it down to the FOS kernel, and swapping in the ubuntu kernel points directly to the FOS kernel at fault.
I hadn’t considered a ubuntu kernel patch to be the solution here either. I used linux 6.5.3 thinking that it should have all of the mainstream patches already in it.
-
@george1421 Thank you all for your continued support in trying to figure out why this is not working. I appreciate all the work and time you have put into this for me and I have learned a lot so far about how this all works.
-
@sgilbe Do you still have access to this server?
-
@george1421 Yes I do
-
@sgilbe Well my first attempt to rebuild the kernel gave me the same results as you. Not what I expected so I need to work a bit more. If I can get something that boots in the next day or so, are you willing to test to see if it resolves your booting issue?
-
@george1421 I am more than willing to test any kernel that you have for me to try. If you need to build several to test different configurations I can work with that as well. It does not take much time to be able to switch between kernels with the USB drive. I have a CentOS on the system and can map my share to be able to just grab and add the kernels as needed.
-
@george1421 any update on a new kernel to be able to try out? Just checking in.
-
@george1421 What are the required FOS settings. I have been trying to build a FOS bzImage but with no success on getting it to boot yet. Would I need a new init.zx if I move to a newer kernel?
-
@sgilbe I haven’t found the right combination to start with a clean kernel and just to get it to run on a standard system. But I do have to admit I haven’t had a lot of extra time lately to work on this.
As for needing a new init.xz. Its not at that point yet. The kernel boots and inits the hardware then connects to the init.xz to startup linux. The issue is within the kernel at this point. It may be as Sebastian mentioned that there was a patch that ubuntu added to make the kernel boot. I’m not at a give up point, but there has to be a solution here.
-
@sgilbe @george1421 I think I have an idea of what kernel modules need to be enabled for this type of CPU. I don’t have my dev laptop with me now, but I’ll work on it tonight/tomorrow so you can try it out.
-
@rodluz If you have an idea, I’m interested since I can’t seem to get the FOS kernel to boot on this hardware, and without having the hardware in hand its difficult to debug the issue too.
The FOS Linux original kernel configuration to start with is here: https://github.com/FOGProject/fos/blob/master/configs/kernelx64.config
-
@sgilbe Try this kernel out. https://drive.google.com/drive/folders/1sP6dfRymYaFTCr8iRiK64hN2pp2X836n?usp=sharing
This is kernel 6.5.6 with some config changes specific for gen 3/4 scalable Xeon CPUs. Please let us know if this works so I can document the changes.
Something else to look at… I had an issue like this with another Linux system last week. The issue turned out to be a Mellanox 40G PCIe card not playing nicely with the Kernel. Have you tried taking out non-essential PCIe cards from the host to test?
-
@rodluz Thank you for your help. I have tested the kernel by putting it on the USB drive that I have setup for FOG and it is still hanging.
As far as pcie cards go I will get a list from lshw and post it here when I get a chance most likely later today.
-
@george1421 If possible would it be helpful if I could get you remote access to the system?
-
@rodluz here are the devices in the system. lshw.txt lshw-businfo.txt
Let me know if that helps.
-
@rodluz I can also try and get you remote access to this system as well if that would help in debugging this issue.
-
@sgilbe Have you tried removing the QSFP card to see if it is still giving you that issue? I doubt that it’s the problem, but it wouldn’t hurt to try.
I’ll keep looking at the kernel config options to see if I find something else that could be missing. It may be a lot of back-and-forth trying different kernel options, since I don’t have any system with those CPUs.
-
@sgilbe I made a few changes to the kernel. Can you try the with the new one here? https://drive.google.com/drive/folders/1sP6dfRymYaFTCr8iRiK64hN2pp2X836n
-
@rodluz I will try and remove the QSFP card and am trying the new kernel now. Will let you know of the outcome.