Network/Fog issue - Machines won't boot from fog
We have used FOG for a while at where I work (a University) and it has worked well so far, however we have run into some weird issue whereby machines on a new network setup just won’t fog boot.
We have 3 Cisco 3750 switches stacked/fibred that serve 3 vlans (3 rooms). Our linux box, acting as a router, connects each of the vlans through separate interfaces, each assigned only to the vlan they are serving. The router then connects to the internet/rest of the university network.
On the same physical switch as the router (but not the rooms), is connected our fog server. This switch is part of the University network which we can’t configure, however this is highly unlikely to be anything to do with the issue we have.
The switch configurations are very basic. I configured three DHCP pools (Network, DNS, default gateway) that serve three subnets. Each of the vlans that serve the three rooms obtain their IP from the respective DHCP pools, relative to their subnets. They can all then boot into Windows, connect to the internet through the router just fine and even ping the FOG server (and connect to the web interface, SFTP, etc).
However, the machines still do not PXE boot, receiving the message that no DHCP or proxyDHCP offers were received. From within Windows, running “tftp <fog IP> get pxelinux.0” times out.
Go across the corridor to another room and they can PXE boot absolutely fine into fog and get the file from within windows fine. The machines are of the same hardware spec, so PXE isn’t the issue (they even imaged from the previous infrastructure).
What on earth could the problem be?
Some additional points, relating to the potential issues:
In each of the vlan interfaces, the “ip helper-address <fog server>” is configured.
The vlans are configured correctly, hosts can all ping fog, get internet connectivity etc.
Running wireshark on the router interface connected to the switch serving the hosts shows that tftp requests are being sent out from hosts and replies from fog are also being sent back, although it appears no acknowledgement is being sent back for each of the blocks.
Running wireshark on hosts shows that tftp requests are being received back and, as before, are not being acknowledged. Acknowledgements are sent back for hosts where fog is “working”.
The switches have no access control lists or any filtering set.
The switches do [B]not[/B] have options 66/67 set (for next-server and filename).
The machines did all, at one time or other, previously boot into fog fine when on a previous network infrastructure, which indicates that the new setup has some issue…
…however it seems like its a host issue if no tftp acknowledgements are being sent back. Although the hosts are not the issue.
[*]dnsmasq.d does serve the subnets to which these hosts belong (and every IP possibility in-between)
I really have no idea what the issue could be. In many senses, it is a very simple setup - even if it was just one room, one switch, a router out to fog, the setup would likely be the same. The same issue is present on subnets in the rest of the University where the ip helper hasn’t been set on the core switches, but on our switches it definitely is.
Any suggestions or pointers would be a seriously big help!
Glad you got it going! Good luck!
Right! It works!
A few days ago I had tried defining the next-server and the bootfile in the dhcp pool, which didn’t work.
Portfast also hasn’t worked.
I have now tried portfast again AND defining the next-server and bootfile… and it works!
I wouldn’t have thought I would have had to have defined these as I already use the helper address as the fog server… but perhaps the switch - acting as a DHCP server itself - just ignored it? I have really no idea. I am still not 100% happy, but it works and works well…
Thanks for the portfast tip!
Have you tried setting up DNSMasq? I would give it a shot and see if it can get the pxe file where it needs to. Look up the settings for using fog with and unmodified dhcp server, I was able to get my pxe client on some units but not others. Setting up Proxydhcp fixed the issue.
Just enabled Portfast and nothing - although thanks for reminding me about that, it probably should be enabled on all host ports!
As for Windows Firewall - it wouldnt be that as the PC does it fine when connected to another switch.
I just can’t work out what this could be. The error I get is usually “No boot filename received” on one hardware set and on another its “No DHCP or proxyDHCP offers received”. Different errors but identical room setups (different NICs though)
Two things come to mind:
Is PortFast enabled on the switch ports feeding those computers?
PortFast being disabled can cause issues when PXE booting because the link takes too long to establish, which shows up as PXE/TFTP timing out.
Is the Windows Firewall allowing the TFTP application?
TFTP connections are established over UDP port 69, but the actual file transfer happens over a random UDP port. So just opening UDP port 69 probably isn’t going to work and it’s generally easier to just allow the TFTP application through the firewall.