New Fog 1.5.10 deployment. Host registration and quick registration hang
-
This is a brand new, greenfield, deployment. I have installed Ubuntu 22.04.03 and Fog 1.5.10. My PXE client (one of them) is a Dell Poweredge 760 running in UEFI boot mode. I have 3rd party DHCP server setup to point to the Fog server and have set x86 boot to look for undionly.kpxe and efi boot to use snp.efi. All of my testing is with efi. The server is able to PXE boot and get to the iPXE boot menu. Regardless of what I select from this menu, it either hangs or gives a chainloading error. I will focus on host quick registration for this discussion.
On a side note, I am able to successfully register a VM PXE boot test host I created so I know registration isn’t broke completely, rather my Dell host will not register.
Also, I’m a complete beginner to Fog. I have scoured this forum and the user docs for the last three days and am completely stuck. If the answer is obvious, I haven’t been able to figure it out. Any guidance is greatly appreciated.
When I select Host Quick Registration and Inventory, I see…
bzImage… OK
init.xz… OKand then a black screen with this message and it is hung indefinitely.
Here is the contents of my default.ipxe file. Note, IP info intentionally changed.
Please let me know what else may help.
#!ipxe
cpuid --ext 29 && set arch x86_64 || set arch ${buildarch}
params
param mac0 ${net0/mac}
param arch ${arch}
param platform ${platform}
param product ${product}
param manufacturer ${product}
param ipxever ${version}
param filename ${filename}
param sysuuid ${uuid}
isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme
isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme
:bootme
chain http://my_fog_server_ip/fog/service/ipxe/boot.php##paramsHere is my boot.php##params…
#!ipxe
set fog-ip my_fog_ip
set fog-webroot fog
set boot-url http://${fog-ip}/${fog-webroot}
set storage-ip my_fog_ip
set keymap us
cpuid --ext 29 && set arch x86_64 || set arch i386
goto get_console
:console_set
colour --rgb 0x00567a 1 ||
colour --rgb 0x00567a 2 ||
colour --rgb 0x00567a 4 ||
cpair --foreground 7 --background 2 2 ||
goto MENU
:alt_console
cpair --background 0 1 ||
cpair --background 1 2 ||
goto MENU
:get_console
console --picture http://my_fog_ip/fog/service/ipxe/bg.png --left 100 --right 80 && goto console_set || goto alt_console
:MENU
menu
colour --rgb 0xff0000 0 ||
cpair --foreground 1 1 ||
cpair --foreground 0 3 ||
cpair --foreground 4 4 ||
item --gap Host is NOT registered!
item --gap – -------------------------------------
item fog.local Boot from hard disk
item fog.memtest Run Memtest86+
item fog.reginput Perform Full Host Registration and Inventory
item fog.reg Quick Registration and Inventory
item fog.deployimage Deploy Image
item fog.multijoin Join Multicast Session
item fog.sysinfo Client System Information (Compatibility)
item fog.debug Debug Mode
choose --default fog.local --timeout 3000 target && goto ${target}
:fog.local
sanboot --no-describe --drive 0x80 || goto MENU
:fog.memtest
kernel memdisk initrd=memtest.bin iso raw
initrd memtest.bin
boot || goto MENU
:fog.reginput
kernel bzImage32 loglevel=4 initrd=init_32.xz root=/dev/ram0 rw ramdisk_size=275000 web=http://my_fog_ip/fog/ consoleblank=0 rootfstype=ext4 storage=20.20.63.254:/images/ storageip=my_fog_ip nvme_core.default_ps_max_latency_us=0 loglevel=4 mode=manreg
imgfetch init_32.xz
boot || goto MENU
:fog.reg
kernel bzImage32 loglevel=4 initrd=init_32.xz root=/dev/ram0 rw ramdisk_size=275000 web=http://my_fog_ip/fog/ consoleblank=0 rootfstype=ext4 storage=20.20.63.254:/images/ storageip=my_fog_ip nvme_core.default_ps_max_latency_us=0 loglevel=4 mode=autoreg
imgfetch init_32.xz
boot || goto MENU
:fog.deployimage
login
params
param mac0 ${net0/mac}
param arch ${arch}
param username ${username}
param password ${password}
param qihost 1
isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme
isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme
param sysuuid ${uuid}
:fog.multijoin
login
params
param mac0 ${net0/mac}
param arch ${arch}
param username ${username}
param password ${password}
param sessionJoin 1
isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme
isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme
param sysuuid ${uuid}
:fog.sysinfo
kernel bzImage32 loglevel=4 initrd=init_32.xz root=/dev/ram0 rw ramdisk_size=275000 web=http://my_fog_ip/fog/ consoleblank=0 rootfstype=ext4 storage=20.20.63.254:/images/ storageip=my_fog_ip nvme_core.default_ps_max_latency_us=0 loglevel=4 mode=sysinfo
imgfetch init_32.xz
boot || goto MENU
:fog.debug
login
params
param extraargs “mode=onlydebug”
param mac0 ${net0/mac}
param arch ${arch}
param username ${username}
param password ${password}
param debugAccess 1
isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme
isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme
param sysuuid ${uuid}
:bootme
chain -ar http://my_fog_ip/fog/service/ipxe/boot.php##params ||
goto MENU
autoboot -
@george1421 said in New Fog 1.5.10 deployment. Host registration and quick registration hang:
https://forums.fogproject.org/topic/16993/client-hangs-at-efi-stub
@george1421 Thank you so much for the reply! The 2nd link you referenced did help get past the initial sticking point.
I disabled “Virtualization Technology” under Sytem BIOS Settings > Processor Settings and now I can actually run Quick reg on the Dell 760. It is failing unfortunately, but at least I have something else troubleshoot now that I am past the initial hang after selecting an option.
I will likely open a new topic if this error continues but this is what I am seeing now. Let the debugging begin…
-
@srcauley To quickly explain where its failing.
If you can get to the FOG iPXE menu then undionly/snp drivers are doing what they should. After you pick a menu item from the fog iPXE menu then FOS linux is transferred to the target computer. This is in the form of the kernel (bzImage) and the virtual hard drive (init.xz). The issue is with the linux kernel failing to start up.
Issue #1 that you have is FOS’ kernel is really targeted to workstation class machines, not servers. Servers have “special” hardware that requires drivers that might not be in the FOS kernel. Saying that I know others have imaged Dell servers with FOG.
Issue #2 The R760s are pretty new hardware. You have to remember that linux kernel lags behind bleeding edge hardware. The actual linux kernel might not have the required brains to be able to init this new hardware.
So what should you do?
The first thing is make sure your firmware is up to date (always first step)
Second thing is make sure you are running the latest FOS Linux kernel from the UI FOG Configuration -> Kernel update. Run the latest version 6 release there is.
Third go into FOG Configuration -> FOG Settings, hit the expand all button and then search for
log
set the kernel logging level to 7 that will send out the most info during the kernel startup. The default is logging level 1 which masks all but the most critical of errors.Now pxe boot the target computer, see if there is any other info we can glean from the kernel startup.
Also test a linux live boot image, either debian or ubuntu. See if you can live boot the server into linux.
I know there was some debugging in the fog forum with a server that had one of those new scale processors. I think they were able to get that running.
-
@srcauley Well I found someone with the same issue as you from last summer, but it looks like the poster ghosted us https://forums.fogproject.org/topic/16922/hand-off-to-fos-kernel-fails-on-certain-gen4-xeon-sapphire-rapids-based-systems-dell-r760-supermicro-x13-etc
This is the one I was thinking of: https://forums.fogproject.org/topic/16993/client-hangs-at-efi-stub it did have a solution with the same error message as you.
-
@george1421 said in New Fog 1.5.10 deployment. Host registration and quick registration hang:
https://forums.fogproject.org/topic/16993/client-hangs-at-efi-stub
@george1421 Thank you so much for the reply! The 2nd link you referenced did help get past the initial sticking point.
I disabled “Virtualization Technology” under Sytem BIOS Settings > Processor Settings and now I can actually run Quick reg on the Dell 760. It is failing unfortunately, but at least I have something else troubleshoot now that I am past the initial hang after selecting an option.
I will likely open a new topic if this error continues but this is what I am seeing now. Let the debugging begin…
-
-
-
@srcauley If you want to debug the next bit I can help with that too. I’m going to suspect its the nic driver, nic firmware, or raid/disk controller issue. If you setup a debug capture/deploy we can then investigate the errors.
ip a s
lspci -nn | grep -i net
lsblk
grep -i firm /var/syslog
Will get you started