Client hangs at EFI stub:
-
@sgilbe Well my first attempt to rebuild the kernel gave me the same results as you. Not what I expected so I need to work a bit more. If I can get something that boots in the next day or so, are you willing to test to see if it resolves your booting issue?
-
@george1421 I am more than willing to test any kernel that you have for me to try. If you need to build several to test different configurations I can work with that as well. It does not take much time to be able to switch between kernels with the USB drive. I have a CentOS on the system and can map my share to be able to just grab and add the kernels as needed.
-
@george1421 any update on a new kernel to be able to try out? Just checking in.
-
@george1421 What are the required FOS settings. I have been trying to build a FOS bzImage but with no success on getting it to boot yet. Would I need a new init.zx if I move to a newer kernel?
-
@sgilbe I haven’t found the right combination to start with a clean kernel and just to get it to run on a standard system. But I do have to admit I haven’t had a lot of extra time lately to work on this.
As for needing a new init.xz. Its not at that point yet. The kernel boots and inits the hardware then connects to the init.xz to startup linux. The issue is within the kernel at this point. It may be as Sebastian mentioned that there was a patch that ubuntu added to make the kernel boot. I’m not at a give up point, but there has to be a solution here.
-
@sgilbe @george1421 I think I have an idea of what kernel modules need to be enabled for this type of CPU. I don’t have my dev laptop with me now, but I’ll work on it tonight/tomorrow so you can try it out.
-
@rodluz If you have an idea, I’m interested since I can’t seem to get the FOS kernel to boot on this hardware, and without having the hardware in hand its difficult to debug the issue too.
The FOS Linux original kernel configuration to start with is here: https://github.com/FOGProject/fos/blob/master/configs/kernelx64.config
-
@sgilbe Try this kernel out. https://drive.google.com/drive/folders/1sP6dfRymYaFTCr8iRiK64hN2pp2X836n?usp=sharing
This is kernel 6.5.6 with some config changes specific for gen 3/4 scalable Xeon CPUs. Please let us know if this works so I can document the changes.
Something else to look at… I had an issue like this with another Linux system last week. The issue turned out to be a Mellanox 40G PCIe card not playing nicely with the Kernel. Have you tried taking out non-essential PCIe cards from the host to test?
-
@rodluz Thank you for your help. I have tested the kernel by putting it on the USB drive that I have setup for FOG and it is still hanging.
As far as pcie cards go I will get a list from lshw and post it here when I get a chance most likely later today.
-
@george1421 If possible would it be helpful if I could get you remote access to the system?
-
@rodluz here are the devices in the system. lshw.txt lshw-businfo.txt
Let me know if that helps.
-
@rodluz I can also try and get you remote access to this system as well if that would help in debugging this issue.
-
@sgilbe Have you tried removing the QSFP card to see if it is still giving you that issue? I doubt that it’s the problem, but it wouldn’t hurt to try.
I’ll keep looking at the kernel config options to see if I find something else that could be missing. It may be a lot of back-and-forth trying different kernel options, since I don’t have any system with those CPUs.
-
@sgilbe I made a few changes to the kernel. Can you try the with the new one here? https://drive.google.com/drive/folders/1sP6dfRymYaFTCr8iRiK64hN2pp2X836n
-
@rodluz I will try and remove the QSFP card and am trying the new kernel now. Will let you know of the outcome.
-
@rodluz It is still hanging. Removing card now.
-
@sgilbe After removing the card it is still hanging.
-
@rodluz Is it possible to try and get a new kernel with version 6.6.0 I have been told it could fix this issue.
-
@george1421 I have been able to get a debug BIOS for my system and was able to capture from a serial port more detail. I know it has been a while since this was updated but was thinking this could give more light to the issue. This is not the full file it is just from the select boot device (USB Drive) to the hang.
Debug out:
Disabling CR4.SMXE…
-> Register 0x11: Pause Resume Complete = 0x01 0x01 0x00
CheckpointSend 0x08? Yes
-> Register 0x11: Pause Resume Complete = 0x01 0x01 0x01
CheckpointSend 0x09? Yes
-> Ready To Boot: Pause Resume Complete = 0x01 0x01 0x01
[HECI Transport-1 DXE] Send pkt: 80040007
00: FF 0C 00 00
[HECI Transport-1 DXE] Got pkt: 80080007
00: FF 8C 00 00 00 00 00 00
IioSecureOnOnReadyToBoot…
IOAT_INIT_READY_TO_BOOT_START
IOAT_INIT_READY_TO_BOOT_END
IOAT_INIT_READY_TO_BOOT_START
IOAT_INIT_READY_TO_BOOT_END
[TDX_LATE] TdxDxeCallbackOnReadyToBoot BEGIN
[TDX_LATE-GET_FROM_ESP] GetTdxSeamldrFromEsp BEGIN
[LoadFileFromEsp] BEGIN
[TDX_LATE-GET_FROM_ESP] Open (Not Found)
[TDX_LATE-GET_FROM_ESP] Open (Not Found)
[TDX_LATE-GET_FROM_ESP] Error (Not Found)
[TDX_LATE-GET_FROM_ESP] GetTdxSeamldrFromEsp END (Not Found)
Error: Unable find TdxSeamldr in ESP
[TDX_LATE-GET_FROM_FV] GetTdxSeamldrFromFv BEGIN
[TDX_LATE-GET_FROM_FV] GetTdxSeamldrFromFv END (Success)
TdxSeamldrAddress (FV): 0x63392018
TdxSeamldrSize (FV): 0x35000
[TDX_LATE-GET_FROM_ESP] GetTdxSeamFromEsp BEGIN
[LoadFileFromEsp] BEGIN
[TDX_LATE-GET_FROM_ESP] Open (Not Found)
[TDX_LATE-GET_FROM_ESP] Open (Not Found)
[TDX_LATE-GET_FROM_ESP] Error (Not Found)
[TDX_LATE-GET_FROM_ESP] GetTdxSeamFromEsp END (Not Found)
Error: Unable find TdxSeam in ESP, aborting!
Unable to find TDX binaries in ESP, falling back to FV!
[TDX_LATE-GET_FROM_FV] GetTdxSeamFromFv BEGIN
[TDX_LATE-GET_FROM_FV] GetTdxSeamFromFv END (Success)
TdxSeamAddress (FV): 0x63344018
TdxSeamSize (FV): 0x26000
[TDX_LATE-GET_FROM_FV] GetTdxSeamSigFromFv BEGIN
[TDX_LATE-GET_FROM_FV] GetTdxSeamSigFromFv END (Success)
TdxSeamSigAddress (FV): 0x64E12018
TdxSeamSigSize (FV): 0x800
[TDX_LATE-HANDLE_SEAMLDR] HandleTdxSeamldr BEGIN
[TDX_LATE-HANDLE_SEAMLDR] ValidateSeamldrBinary BEGIN
[TDX_LATE-HANDLE_SEAMLDR] ValidateSeamldrBinary END (Success)
[TDX_LATE-HANDLE_SEAMLDR] ProgramTdxSeamldrSeSvn BEGIN
Extracted SE_SVN = 0x04 from ACM_HEADER!
Extracted SE_SVN = 0x00 from NVRAM!
Seamldr SE_SVN = 0x04 selected from ACM_HEADER!
[TDX_LATE-HANDLE_SEAMLDR] ProgramTdxSeamldrSeSvn END (Success)
[TDX_LATE-HANDLE_SEAMLDR] LoadTdxSeamldr BEGIN
AsmLaunchTdxSeamldr BEGIN (0x6330F000)
AsmLaunchTdxSeamldr END (0x0)
[TDX_LATE-HANDLE_SEAMLDR] LoadTdxSeamldr END (Success)
[TDX_LATE-HANDLE_SEAMLDR] HandleTdxSeamldr END (Success)
[TDX_LATE-HANDLE_SEAM] HandleTdxSeam BEGIN
[TDX_LATE-VMX] BaseFruUcode4v0_SetCrsForVmx BEGIN
[TDX_LATE-VMX] BaseFruUcode4v0_SetCrsForVmx END (Success)
[TDX_LATE-VMX] VmxOnAndSeamcallThenVmxOffOnAllLps BEGIN
[TDX_LATE-VMX] VmxOnAndSeamcallThenVmxOffOnAllLps END (Success)
[TDX_LATE-HANDLE_SEAM] HandleTdxSeam END (Success)
[TDX_LATE] TdxDxeCallbackOnReadyToBoot END
SmmInstallProtocolInterface: 6E057ECF-FA99-4F39-95BC-59F9921D17E4 0
[SmbiosIFWI] Get Ifwi Version failed.
[HECI Transport-1 DXE] Send pkt: 80040007
00: FF 02 00 00
[HECI Transport-1 DXE] Got pkt: 80140007
00: FF 82 00 00 01 00 06 18 - 30 00 02 00 01 00 06 18
10: 30 00 02 00
TPM Location configured (expected values: dTPM = 0x5 = 0x5
Value at TPM Base Address (0xFED40000) = 0xA1
HierarchyChangeAuth: Response Code error! 0x000009A2
PROGRESS CODE: V03051001 I0
PROGRESS CODE: V03058000 I0
PROGRESS CODE: V03058001 I0
[0m[30m[40m[2J[01;01H[0m[37m[40m[02;30HGNU GRUB version 2.06[04;02H/----------------------------------------------------------------------------\[05;02H|[05;79H|[06;02H|[06;79H|[07;02H|[07;79H|[08;02H|[08;79H|[09;02H|[09;79H|[10;02H|[10;79H|[11;02H|[11;79H|[12;02H|[12;79H|[13;02H|[13;79H|[14;02H|[14;79H|[15;02H|[15;79H|[16;02H|[16;79H|[17;02H|[17;79H|[18;02H----------------------------------------------------------------------------/[19;02H[20;02H Use the ^ and v keys to select which entry is highlighted.
Press enter to boot the selected OS, `e' to edit the commands before booting or `c' for a command-line. [05;80H [0m[30m[47m[05;03H*1. FOG Image Deploy/Capture [0m[37m[40m[05;78H[06;03H 2. Perform Full Host Registration and Inventory [06;78H[07;03H 3. Quick Registration and Inventory [07;78H[08;03H 4. Client System Information (Compatibility) [08;78H[09;03H 5. Run Memtest86+ [09;78H[10;03H 6. FOG Debug Kernel [10;78H[11;03H 7. FOG iPXE Jumpstart BIOS [11;78H[12;03H 8. FOG iPXE Jumpstart EFI [12;78H[13;03H [13;78H[14;03H [14;78H[15;03H [15;78H[16;03H [16;78H[17;03H [17;78H[17;80H [05;78H[23;01H [24;01H [05;78H[05;03H 1. FOG Image Deploy/Capture [05;78H[0m[30m[47m[06;03H*2. Perform Full Host Registration and Inventory [0m[37m[40m[06;78H[06;03H 2. Perform Full Host Registration and Inventory [06;78H[0m[30m[47m[07;03H*3. Quick Registration and Inventory [0m[37m[40m[07;78H[07;03H 3. Quick Registration and Inventory [07;78H[0m[30m[47m[08;03H*4. Client System Information (Compatibility) [0m[37m[40m[08;78H[08;03H 4. Client System Information (Compatibility) [08;78H[0m[30m[47m[09;03H*5. Run Memtest86+ [0m[37m[40m[09;78H[09;03H 5. Run Memtest86+ [09;78H[0m[30m[47m[10;03H*6. FOG Debug Kernel [0m[37m[40m[10;78H[0m[30m[40m[2J[01;01H[0m[37m[40m[0m[30m[40m[2J[01;01H[0m[37m[40mloading the kernel
loading the virtual hard drive
booting kernel…
IioSecureOnExitBootServices…
SmmInstallProtocolInterface: 296EB418-C4C8-4E05-AB59-39E8AF56F00A 0
EnablePatrolScrubatEndofPostCallback Exit
PROGRESS CODE: V03101019 I0 -
@sgilbe I’m not seeing anything that jumps out at me as being wrong. It surely seems to fail at starting the kernel.
Are you sure you don’t have secure boot enabled on this system?