FOG DHCP Server on Multiple VLAN Network
Run as a VM in ESXi 6.7 free
FOG uses a single virtual NIC (typically)
FOG is the only DHCP server on any of the desired VLANs I know of. I do use 2 other servers for DNS. Otherwise the FOG server is self sufficient. Everything works as expected for hosts on the same VLAN/subnet as the FOG server.
I need to get FOG to PXE boot and be able to image/capture from hosts on multiple VLANs/subnets. Its not an option for me to put everything on the same VLAN.
I have a physical network that’s structured as follows:
- Palo Alto Firewall
- Juniper Switch
- Arista Switches
I have the following VLANs/subnets:
- 100/10.0.0.x - FOG is 10.0.0.2 on this vlan and works when physical and virtual hosts are on the same vlan
- 1600/172.16.0.x - Many hosts on this vlan.
- 1605/172.16.5.x - Many virtual hosts (VM’s) on this vlan.
The 100 VLAN can cross communicate with all other VLANs, setup under the settings in the Palo Alto (rules, zones, etc). IE: its possible to ping across VLANs, SMB, etc.
I am currently not concerned with making any other VLANs work with FOG but it would be nice to be able to expand to include others in the future.
Things I Tried
I started by creating a DCHP Relay agent on the Palo Alto and editing the dhcpd.conf file to include another subnet section. This alone ended up with the closest to working I got. Hosts on 1600 for example would PXE boot, get the proper IP address from the proper subnet but would then give a PXE error, I think #11 regarding ARP (PXE-E11: ARP Timeout). At this stage they would just give the option to press a key to reboot.
I then took steps to try and configure Juniper/Arista switches to also handle DHCP relay with no luck, 0 change in the outcome.
I also tried adding another virtual NIC to FOG, placing it on another VLAN/subnet and ran into odd issues, eventually I think coming down to TFTP being assigned a static/single IP address in the FOG gui/settings. The first problem with this was that the original NIC on 10.0.0.x seemingly stopped working for everything expect PXE booting (ping, network, internet all stopped working) while the new NIC on 172.16.0.x worked as expected. Disabling the new NIC (in CentOS) would make the 1st NIC instantly start working completely. Machines PXE booted on the second new VLAN would get a little further than before but are unable to load the boot files as TFTP still appears to try loading them from the original IP/vlan address on 10.0.0.x and fails.
So I am not sure whats easier to correct, the ARP timeout when using DHCP Relays or the issue of TFTP not using the IP/vlan of the NIC on FOG matching the hosts VLAN/subnet.
I am open to any solution that can get this working or at least an explanation of the limitations of FOG in a setup like this, as I imagine its possible I am just trying to do something that shouldnt be done. For what its worth I have been through our wiki, many hours of research on Google and trying to iron this out with the help of multiple co-workers and we are stumped.
I have tried to provide as much relevant details as I can and will be happy to provide more to the extent that I am allowed.
Thank you for any help or guidance you may be able to provide.
@Zer0Cool From what you describe you went a far step already. DHCP relay is definitely what you need to male PXE booting across subnets possible. Would you mind taking pictures or maybe even a video of a client booting that ends in the timeout? Possibly this could shed a light on what is missing.
@george1421 Hey Thanks for the reply. To answer your question, we do not use any other DHCP server currently. All IP’s on other VLANs are set statically. The only DHCP server is the FOG server.
I know it sounds dumb, but in my environment, it makes sense for many reasons.
As you have explained it, am I to understand that I need another DHCP server aside from the FOG server to make this work? If not, does it make it any easier?
I am open to any suggestions that would make it possible (preferably easy) to have FOG (or a “helper” DHCP server) be able to work across VLANs.
Lets take a step back here. With such a complex network why are you using the FOG DHCP server instead of your network infrastructure dhcp servers?
FOG does work across subnets, what you need is the dhcp server responsible for those other subnets to set dhcp option 66 to the fog server IP and dhcp option 67 to the boot kernel. The issue you will run into now is that if you have both bios and uefi systems on your network, your dhcp server will need to be smart enough to send the right boot kernel name based on the target hardware. Windows dhcp on 2012 and newer will do this correctly if configured right. Linux dhcp server does that automatically like FOG uses.
If you have intervlan routing working correctly, you can/should be able to image with FOG. There are some pxe roms that are not very smart and may not like pxe booting across subnets.
Instead of messing with all of these dhcp helpers, you should take a step back and answer what device provides your dhcp services for these other subnets?