Intermittent no such file or directory boot.php
-
This is really odd as boot works consistently if fog has just been rebooted - but only on the first operation (image registration or deploy) then it goes back to being intermittent.
If I leave the client bootlooping after 10 or so cycles it will continue through to the fog registration and imaging phase - most of the time.
I have verified I am able to access http://10.0.0.250/fog/service/ipxe/boot.php consistently from linux/windows/OSX without fail. This issue occurs with physical clients and virtualized clients.
The fog server is running DHCP and is on an isolated vlan with portfast enabled - connecting directly to a dummy switch does not resolve this issue.
Verified Ubuntu firewall is disabled
foggy@fog:~$ sudo ufw status verbose
Status: inactiveFog Version: 1.5.7
Ubuntu Version: Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-65-generic x86_64)
Server specs:
8 CPU @ 2.53GHz (was an older DL360 G6 that was just sitting around)
24 Gigs of DDR3 2133mhz ECC RAM
1 NIC attached (no errors on switch port its attached to - replaced cable anyway, tried switching NIC’s)Here is the boot.php content
#!ipxe
set fog-ip 10.0.0.250
set fog-webroot fog
set boot-url http://${fog-ip}/${fog-webroot}
cpuid --ext 29 && set arch x86_64 || set arch i386
goto get_console
:console_set
colour --rgb 0x00567a 1 ||
colour --rgb 0x00567a 2 ||
colour --rgb 0x00567a 4 ||
cpair --foreground 7 --background 2 2 ||
goto MENU
:alt_console
cpair --background 0 1 ||
cpair --background 1 2 ||
goto MENU
:get_console
console --picture http://10.0.0.250/fog/service/ipxe/bg.png --left 100 --right 80 && goto console_set || goto alt_console
:MENU
menu
colour --rgb 0xff0000 0 ||
cpair --foreground 1 1 ||
cpair --foreground 0 3 ||
cpair --foreground 4 4 ||
item --gap Host is NOT registered!
item --gap – -------------------------------------
item fog.local Boot from hard disk
item fog.memtest Run Memtest86+
item fog.reginput Perform Full Host Registration and Inventory
item fog.reg Quick Registration and Inventory
item fog.deployimage Deploy Image
item fog.multijoin Join Multicast Session
item fog.sysinfo Client System Information (Compatibility)
choose --default fog.local --timeout 3000 target && goto ${target}
:fog.local
sanboot --no-describe --drive 0x80 || goto MENU
:fog.memtest
kernel memdisk initrd=memtest.bin iso raw
initrd memtest.bin
boot || goto MENU
:fog.reginput
kernel bzImage32 loglevel=4 initrd=init_32.xz root=/dev/ram0 rw ramdisk_size=275000 web=http://10.0.0.250/fog/ consoleblank=0 rootfstype=ext4 storage=10.0.0.250:/images/ storageip=10.0.0.250 loglevel=4 mode=manreg
imgfetch init_32.xz
boot || goto MENU
:fog.reg
kernel bzImage32 loglevel=4 initrd=init_32.xz root=/dev/ram0 rw ramdisk_size=275000 web=http://10.0.0.250/fog/ consoleblank=0 rootfstype=ext4 storage=10.0.0.250:/images/ storageip=10.0.0.250 loglevel=4 mode=autoreg
imgfetch init_32.xz
boot || goto MENU
:fog.deployimage
login
params
param mac0 ${net0/mac}
param arch ${arch}
param username ${username}
param password ${password}
param qihost 1
isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme
isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme
param sysuuid ${uuid}
:fog.multijoin
login
params
param mac0 ${net0/mac}
param arch ${arch}
param username ${username}
param password ${password}
param sessionJoin 1
isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme
isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme
param sysuuid ${uuid}
:fog.sysinfo
kernel bzImage32 loglevel=4 initrd=init_32.xz root=/dev/ram0 rw ramdisk_size=275000 web=http://10.0.0.250/fog/ consoleblank=0 rootfstype=ext4 storage=10.0.0.250:/images/ storageip=10.0.0.250 loglevel=4 mode=sysinfo
imgfetch init_32.xz
boot || goto MENU
:bootme
chain -ar http://10.0.0.250/fog/service/ipxe/boot.php##params ||
goto MENU
autoboot!Screenshot from ESXI pxe boot - Exact same error on physical clients
Any help or direction would be appreciated
-
Since the apache crashes did not correlate with the no directory or file errors I double checked everything attached to our workbench and…
Found out one of my engineers had a test box attached to our workbench that was set for no reply to ICMP so I did not catch it had been static’d to 10.0.0.250 as well. Flipped it over to DHCP and now PXE boot works consistently.
Sorry and thank you for your time!
-
@jashley Anything in the Apache error log when this happens?? See my signature on where to find the log.
-
From a couple of tries then a restart error.log.1
[Thu Oct 17 12:53:39.[Thu Oct 17 12:56:22.309094 2019] [mpm_prefork:notice] [pid 2036] AH00169: caught SIGTERM, shutting down
[Thu Oct 17 13:15:53.987103 2019] [mpm_prefork:notice] [pid 2158] AH00163: Apache/2.4.29 (Ubuntu) OpenSSL/1.1.1 configured – resuming normal operations
[Thu Oct 17 13:15:54.098310 2019] [core:notice] [pid 2158] AH00094: Command line: ‘/usr/sbin/apache2’
[Thu Oct 17 13:04:26.059866 2019] [mpm_prefork:notice] [pid 2158] AH00169: caught SIGTERM, shutting down
[Thu Oct 17 13:24:01.222333 2019] [mpm_prefork:notice] [pid 2023] AH00163: Apache/2.4.29 (Ubuntu) OpenSSL/1.1.1 configured – resuming normal operations
[Thu Oct 17 13:24:01.414886 2019] [core:notice] [pid 2023] AH00094: Command line: ‘/usr/sbin/apache2’
[Thu Oct 17 13:14:33.786132 2019] [mpm_prefork:notice] [pid 2023] AH00169: caught SIGTERM, shutting down
[Thu Oct 17 13:34:03.037187 2019] [mpm_prefork:notice] [pid 2074] AH00163: Apache/2.4.29 (Ubuntu) OpenSSL/1.1.1 configured – resuming normal operations
[Thu Oct 17 13:34:03.227593 2019] [core:notice] [pid 2074] AH00094: Command line: ‘/usr/sbin/apache2’
[Fri Oct 18 00:06:40.122273 2019] [mpm_prefork:notice] [pid 2074] AH00171: Graceful restart requested, doing restartThen fresh after successful host registration and no boot.php again
foggy@fog:/var/log/apache2$ cat error.log
[Fri Oct 18 00:06:40.167102 2019] [mpm_prefork:notice] [pid 2074] AH00163: Apache/2.4.29 (Ubuntu) OpenSSL/1.1.1 configured – resuming normal operations
[Fri Oct 18 00:06:40.167132 2019] [core:notice] [pid 2074] AH00094: Command line: ‘/usr/sbin/apache2’ -
Since the apache crashes did not correlate with the no directory or file errors I double checked everything attached to our workbench and…
Found out one of my engineers had a test box attached to our workbench that was set for no reply to ICMP so I did not catch it had been static’d to 10.0.0.250 as well. Flipped it over to DHCP and now PXE boot works consistently.
Sorry and thank you for your time!