Unsolved Multiple TFTP Servers
Im trying to set up the configuration shown in the following wiki page so I can have a storage module working as TFTP server too in the version 1.5.10 of FOG:
@GorkaAP I hate to begin with this, but that referenced document deals with a 10 year out of date version of FOG.
Lets see if we can work out a solution using a current release. Please state the problem you are trying to resolve.
@george1421 Currently, I am engaged in scientific research across several buildings utilizing the university’s network infrastructure. The university manages this network, granting us limited control for adjustments while restricting access to configurations and firewalls. Within our campus, four buildings fall under our management, including the switches. Our FOG server operates effectively within this cluster. An additional building, located 4 kilometers away, shares the same network subnet. Both our campus and the university utilize a common DHCP server for this subnet, while we employ DNSMASQ to enable FOG functionality.
While we can successfully ping and access services within the network, FOG booting remains unsuccessful. We talked with the university to check if they are blocking some services and they told us that they aren’t (they are willing to help us with this problem). Sending traffic to the FOG server’s UDP port 69 from the distant building results in successful packet capture, as observed using tcpdump’s UDP feature. But when it comes to PXE booting, It simply doesn’t work.
Considering this, we contemplated employing a dedicated storage node within the distant building to serve as a TFTP server, aiming to address the issue. Despite attempts to configure a storage node and disable DNSMASQ on the primary FOG server, PXE booting only advances to the login screen before encountering a chainload error upon accessing “boot.php” from the main server.
We have tried many things and surely there are couple of things that we checked but I dont actually remember. If someone knows any method to check why things are failing, or alternative solutions would be greatly appreciated.
@GorkaAP Ok your explanation is very clear of the problem.
I have a couple of ideas here:
An additional building, located 4 kilometers away, shares the same network subnet.
I have see remote locations connected via a WAN have issue with loading the iPXE boot loaders via tftp. In this case the computers would error out where it can not download the NBF boot file. The issue was related to the tftp block size being larger that the MTU packet size on the WAN. If you are direct conneted between the remote building with fiber this is probably not your issue.
Having both locations on one subnet makes things a bit harder since dhcp works off broadcast domains and your local and remote locations have the same broadcast domain since they are on the same subnet.
The FOG booting process is such
PXE Rom (target computer) boots and queries dhcp to find dhcp options 66 and 67
PXE Rom downloads the bootloader pointed to by options 66 and 67
The iPXE boot loader boots and again queries dhcp for dhcp option 66 to locate the FOG server.
The IPXE boot loader then will chain load to the fog server over tftp default.ipxe
default.ipxe will chain load boot.php.
If you are on the same subnet between the sites and it works at the main campus but not at the remote campus then this is the first time ipxe chain loads to http instead of tftp. From the remote campus can you get to the fog server’s web ui?
It might help to debug if you can snap a clear picture of the error message on the target computer as you get the chain load error.
One additional thing I can think is if you have more than 1 dhcp server within this broadcast domain (such as a primary/slave) make sure both have the proper dhcp option settings. I have see two dhcp servers with one having the setting configured and the other without cause random issues. Whichever dhcp responded first the client would use (one having the proper boot setting and the other without).
Bonus additional thing: You are using dnsmasq to provide pxe boot information. Could there be something filtering out the DHCP Discover packet from the client at the remote site? I can see where/if DNSMASQ would work at the main campus, where the remote campus might not, if the DISCOVER packet is getting lost on its way to the DNSMASQ service. You can test this on the dnsmaq server by using tcpdump and monitoring for
port 67 or port 68Now power on a computer at the remote building, do you see the DISCOVER packet arrive at the dnsmaq server? The DISCOVER packet starts the process in DNSMASQ to send the pxe boot information to the target computer.
Bonus++ thing. If your link speed to the remote location is less than 1GbE you can install a fog storage node at that location and deploy your computers using the storage node. (this assumes you solve the pxe booting issue). You will install the location plugin into fog then assign computers to the remote location as well as the storage node. It will still boot using the main campus dhcp and tftp server, but actual image deployment will happen via the storage node not the WAN link.
@george1421 Using “tcpdump” we figured out that the DISCOVER packet was getting lost on its way to the DNSMASQ service. As you stated, we could reach HTTP/HTTPS between the remote location and here, so that wasn’t the problem.
We got it working by installing a storage node, and installing DNSMASQ in a server that we located in the remote location’s switch. As my original post said, we got problems trying this solution, but after copying the “/tftpboot” folder with all the PXE binaries from our FOG server to the second server, it worked well because it succesfully reaches the boot.php file and the chainload error doesn’t appear.
Your response has been incrediby useful, thank you very much for your support.