FOG initial images boots but cannot subsequently locate DHCP server
-
I’ve got a newly configured FOG server. I’m running the latest binary (1.5.4) which I downloaded as a tarball from fogproject.org and ran ./installfog.sh. Server is ubuntu server 16.04.05.
On a Dell Optiplex I boot from the onboard ethernet controller [ETH0]. It boots into iPXE but fails to locate a DHCP server…
Configuring (net0 00:1a:a0:xx:xx:xx)...... No configuration methods succeeded
from there i drop into the iPXE shell and type DHCP which succeeds. From here if I exit iPXE it boots into the fog menu, I then run Full Registration and I get the following error:
Either DHCP failed or we were unable to access http://192.168.50.15/fog/index.php
it repeats the discovery/fail process three times and then leaves me with ‘enter to continue’.
On one of the attempts after it failed i got to a menu where I could look at the network configuration and it clearly didn’t have a configured IP. The other times it’s attempted to register and then it times out and leaves me with a ‘computer will reboot in 1 minute’.
I have also had a couple of instances where the fog menu has appeared almost but immediately after the machine has booted into windows, I don’t think this is entiely relevant but I thought it’d be worth mentioning.
So I’m thinking that I do have a problem with my DHCP server. I’m running a pfSense appliance and DHCP is otherwise working as expected. Although I’ve added the two configuration items into ‘network boot’ on the pfSense DHCP server config neither would show up in iPXE with a SHOW command, so I’ve manually added options 66 & 67 which seem to at least partially work.
In the iPXE shell i get valid responses to DHCP (OK), ROUTE (shows valid route), SHOW 66 (shows FOG IP), SHOW 67 (Shows undionly.kpxe)
service tftpd-hpa status shows an active service
Any thoughts on how to move this forward?
-
Do you have an unmanged (dumb) switch to place between the pxe booting computer and your building network switch.
Your condition sounds like a spanning tree issue, where you are running standard spanning tree and not one of the fast spanning tree protocols (RSTP, MSTP, fast-STP, etc). I specifically think this because time corrects the iPXE dhcp process. Placing an unmanged switch between the pxe booting computer and the building network switch keeps the building switch port from dropping during the pxe booting process and is a quick indicator of traditional spanning tree protocol.
Side note: roll your FOS kernel back to 4.15.2 to have a better experience with UEFI / GPT disks. There is a regression bug in 4.17.0 that the developers haven’t been able to correct just yet.
-
Hi, just wanted to reply and say thanks for your help. You are correct that I’m on a managed switch and STP is configured for PVST.
Sadly I’m a bit hamstrung as the FOG server is on an ESXi host and so it’s not practical at this point to plug it via a dumb switch to test, and I’m a bit leery of reconfiguring the switch for rapid-pvst at the moment. I’ll give it a go next time I’m in when the office is shut.
Edit: Actually I just re-read your post, I’ll try sticking a dumb switch in line today and see how it goes.
Again, thanks for your reply.
-
@mrpatrick Just so we are clear, the issue is on the target computer end of the connection not the FOG server. Its related to the transition of kernels from PXE rom to iPXE and then iPXE to FOS.
-
Hi, yeah I’d understood what you were getting at. I’ve tested with a dumb switch in line at the client end and it worked first time.
Many thanks for your help.