Client hangs at EFI stub:
-
@rodluz If you have an idea, I’m interested since I can’t seem to get the FOS kernel to boot on this hardware, and without having the hardware in hand its difficult to debug the issue too.
The FOS Linux original kernel configuration to start with is here: https://github.com/FOGProject/fos/blob/master/configs/kernelx64.config
-
@sgilbe Try this kernel out. https://drive.google.com/drive/folders/1sP6dfRymYaFTCr8iRiK64hN2pp2X836n?usp=sharing
This is kernel 6.5.6 with some config changes specific for gen 3/4 scalable Xeon CPUs. Please let us know if this works so I can document the changes.
Something else to look at… I had an issue like this with another Linux system last week. The issue turned out to be a Mellanox 40G PCIe card not playing nicely with the Kernel. Have you tried taking out non-essential PCIe cards from the host to test?
-
@rodluz Thank you for your help. I have tested the kernel by putting it on the USB drive that I have setup for FOG and it is still hanging.
As far as pcie cards go I will get a list from lshw and post it here when I get a chance most likely later today.
-
@george1421 If possible would it be helpful if I could get you remote access to the system?
-
@rodluz here are the devices in the system. lshw.txt lshw-businfo.txt
Let me know if that helps.
-
@rodluz I can also try and get you remote access to this system as well if that would help in debugging this issue.
-
@sgilbe Have you tried removing the QSFP card to see if it is still giving you that issue? I doubt that it’s the problem, but it wouldn’t hurt to try.
I’ll keep looking at the kernel config options to see if I find something else that could be missing. It may be a lot of back-and-forth trying different kernel options, since I don’t have any system with those CPUs.
-
@sgilbe I made a few changes to the kernel. Can you try the with the new one here? https://drive.google.com/drive/folders/1sP6dfRymYaFTCr8iRiK64hN2pp2X836n
-
@rodluz I will try and remove the QSFP card and am trying the new kernel now. Will let you know of the outcome.
-
@rodluz It is still hanging. Removing card now.
-
@sgilbe After removing the card it is still hanging.
-
@rodluz Is it possible to try and get a new kernel with version 6.6.0 I have been told it could fix this issue.
-
@george1421 I have been able to get a debug BIOS for my system and was able to capture from a serial port more detail. I know it has been a while since this was updated but was thinking this could give more light to the issue. This is not the full file it is just from the select boot device (USB Drive) to the hang.
Debug out:
Disabling CR4.SMXE…
-> Register 0x11: Pause Resume Complete = 0x01 0x01 0x00
CheckpointSend 0x08? Yes
-> Register 0x11: Pause Resume Complete = 0x01 0x01 0x01
CheckpointSend 0x09? Yes
-> Ready To Boot: Pause Resume Complete = 0x01 0x01 0x01
[HECI Transport-1 DXE] Send pkt: 80040007
00: FF 0C 00 00
[HECI Transport-1 DXE] Got pkt: 80080007
00: FF 8C 00 00 00 00 00 00
IioSecureOnOnReadyToBoot…
IOAT_INIT_READY_TO_BOOT_START
IOAT_INIT_READY_TO_BOOT_END
IOAT_INIT_READY_TO_BOOT_START
IOAT_INIT_READY_TO_BOOT_END
[TDX_LATE] TdxDxeCallbackOnReadyToBoot BEGIN
[TDX_LATE-GET_FROM_ESP] GetTdxSeamldrFromEsp BEGIN
[LoadFileFromEsp] BEGIN
[TDX_LATE-GET_FROM_ESP] Open (Not Found)
[TDX_LATE-GET_FROM_ESP] Open (Not Found)
[TDX_LATE-GET_FROM_ESP] Error (Not Found)
[TDX_LATE-GET_FROM_ESP] GetTdxSeamldrFromEsp END (Not Found)
Error: Unable find TdxSeamldr in ESP
[TDX_LATE-GET_FROM_FV] GetTdxSeamldrFromFv BEGIN
[TDX_LATE-GET_FROM_FV] GetTdxSeamldrFromFv END (Success)
TdxSeamldrAddress (FV): 0x63392018
TdxSeamldrSize (FV): 0x35000
[TDX_LATE-GET_FROM_ESP] GetTdxSeamFromEsp BEGIN
[LoadFileFromEsp] BEGIN
[TDX_LATE-GET_FROM_ESP] Open (Not Found)
[TDX_LATE-GET_FROM_ESP] Open (Not Found)
[TDX_LATE-GET_FROM_ESP] Error (Not Found)
[TDX_LATE-GET_FROM_ESP] GetTdxSeamFromEsp END (Not Found)
Error: Unable find TdxSeam in ESP, aborting!
Unable to find TDX binaries in ESP, falling back to FV!
[TDX_LATE-GET_FROM_FV] GetTdxSeamFromFv BEGIN
[TDX_LATE-GET_FROM_FV] GetTdxSeamFromFv END (Success)
TdxSeamAddress (FV): 0x63344018
TdxSeamSize (FV): 0x26000
[TDX_LATE-GET_FROM_FV] GetTdxSeamSigFromFv BEGIN
[TDX_LATE-GET_FROM_FV] GetTdxSeamSigFromFv END (Success)
TdxSeamSigAddress (FV): 0x64E12018
TdxSeamSigSize (FV): 0x800
[TDX_LATE-HANDLE_SEAMLDR] HandleTdxSeamldr BEGIN
[TDX_LATE-HANDLE_SEAMLDR] ValidateSeamldrBinary BEGIN
[TDX_LATE-HANDLE_SEAMLDR] ValidateSeamldrBinary END (Success)
[TDX_LATE-HANDLE_SEAMLDR] ProgramTdxSeamldrSeSvn BEGIN
Extracted SE_SVN = 0x04 from ACM_HEADER!
Extracted SE_SVN = 0x00 from NVRAM!
Seamldr SE_SVN = 0x04 selected from ACM_HEADER!
[TDX_LATE-HANDLE_SEAMLDR] ProgramTdxSeamldrSeSvn END (Success)
[TDX_LATE-HANDLE_SEAMLDR] LoadTdxSeamldr BEGIN
AsmLaunchTdxSeamldr BEGIN (0x6330F000)
AsmLaunchTdxSeamldr END (0x0)
[TDX_LATE-HANDLE_SEAMLDR] LoadTdxSeamldr END (Success)
[TDX_LATE-HANDLE_SEAMLDR] HandleTdxSeamldr END (Success)
[TDX_LATE-HANDLE_SEAM] HandleTdxSeam BEGIN
[TDX_LATE-VMX] BaseFruUcode4v0_SetCrsForVmx BEGIN
[TDX_LATE-VMX] BaseFruUcode4v0_SetCrsForVmx END (Success)
[TDX_LATE-VMX] VmxOnAndSeamcallThenVmxOffOnAllLps BEGIN
[TDX_LATE-VMX] VmxOnAndSeamcallThenVmxOffOnAllLps END (Success)
[TDX_LATE-HANDLE_SEAM] HandleTdxSeam END (Success)
[TDX_LATE] TdxDxeCallbackOnReadyToBoot END
SmmInstallProtocolInterface: 6E057ECF-FA99-4F39-95BC-59F9921D17E4 0
[SmbiosIFWI] Get Ifwi Version failed.
[HECI Transport-1 DXE] Send pkt: 80040007
00: FF 02 00 00
[HECI Transport-1 DXE] Got pkt: 80140007
00: FF 82 00 00 01 00 06 18 - 30 00 02 00 01 00 06 18
10: 30 00 02 00
TPM Location configured (expected values: dTPM = 0x5 = 0x5
Value at TPM Base Address (0xFED40000) = 0xA1
HierarchyChangeAuth: Response Code error! 0x000009A2
PROGRESS CODE: V03051001 I0
PROGRESS CODE: V03058000 I0
PROGRESS CODE: V03058001 I0
[0m[30m[40m[2J[01;01H[0m[37m[40m[02;30HGNU GRUB version 2.06[04;02H/----------------------------------------------------------------------------\[05;02H|[05;79H|[06;02H|[06;79H|[07;02H|[07;79H|[08;02H|[08;79H|[09;02H|[09;79H|[10;02H|[10;79H|[11;02H|[11;79H|[12;02H|[12;79H|[13;02H|[13;79H|[14;02H|[14;79H|[15;02H|[15;79H|[16;02H|[16;79H|[17;02H|[17;79H|[18;02H----------------------------------------------------------------------------/[19;02H[20;02H Use the ^ and v keys to select which entry is highlighted.
Press enter to boot the selected OS, `e' to edit the commands before booting or `c' for a command-line. [05;80H [0m[30m[47m[05;03H*1. FOG Image Deploy/Capture [0m[37m[40m[05;78H[06;03H 2. Perform Full Host Registration and Inventory [06;78H[07;03H 3. Quick Registration and Inventory [07;78H[08;03H 4. Client System Information (Compatibility) [08;78H[09;03H 5. Run Memtest86+ [09;78H[10;03H 6. FOG Debug Kernel [10;78H[11;03H 7. FOG iPXE Jumpstart BIOS [11;78H[12;03H 8. FOG iPXE Jumpstart EFI [12;78H[13;03H [13;78H[14;03H [14;78H[15;03H [15;78H[16;03H [16;78H[17;03H [17;78H[17;80H [05;78H[23;01H [24;01H [05;78H[05;03H 1. FOG Image Deploy/Capture [05;78H[0m[30m[47m[06;03H*2. Perform Full Host Registration and Inventory [0m[37m[40m[06;78H[06;03H 2. Perform Full Host Registration and Inventory [06;78H[0m[30m[47m[07;03H*3. Quick Registration and Inventory [0m[37m[40m[07;78H[07;03H 3. Quick Registration and Inventory [07;78H[0m[30m[47m[08;03H*4. Client System Information (Compatibility) [0m[37m[40m[08;78H[08;03H 4. Client System Information (Compatibility) [08;78H[0m[30m[47m[09;03H*5. Run Memtest86+ [0m[37m[40m[09;78H[09;03H 5. Run Memtest86+ [09;78H[0m[30m[47m[10;03H*6. FOG Debug Kernel [0m[37m[40m[10;78H[0m[30m[40m[2J[01;01H[0m[37m[40m[0m[30m[40m[2J[01;01H[0m[37m[40mloading the kernel
loading the virtual hard drive
booting kernel…
IioSecureOnExitBootServices…
SmmInstallProtocolInterface: 296EB418-C4C8-4E05-AB59-39E8AF56F00A 0
EnablePatrolScrubatEndofPostCallback Exit
PROGRESS CODE: V03101019 I0 -
@sgilbe I’m not seeing anything that jumps out at me as being wrong. It surely seems to fail at starting the kernel.
Are you sure you don’t have secure boot enabled on this system?
-
@george1421 Yes secure boot is disabled. Here is a picture of the BIOS setting.
-
I have the same problem on a HP Z8 G5 dual Xeon silver 4410Y some kernels show just the first line, others both
-
@SaturTP I am glad I am not the only one having this issue. Hopefully we can figure out what is causing the issue and get a fix implemented. Currently we are imaging our systems with this issue by hand which is taking a lot more time. I have it on a Quanta and a SMC server as well.
-
@sgilbe Hi, sorry for the delay.
I have 3 new kernels to try. I hope one of these works for you https://drive.google.com/drive/folders/1sP6dfRymYaFTCr8iRiK64hN2pp2X836n?usp=drive_linkIf any one of these works, please let me know which one so I can document it.
-
@sgilbe Just a thought here, but is there a possible firmware update needed for these devices too?
I don’t think updating the kernels is going to be a bad thing but also ensuring the firmware is updated to the latest for efi stub integration seems the next potentially most logical step?
-