Clients will not consistently boot into PXE environment - dnsmasq
-
@jhumpf said in Clients will not consistently boot into PXE environment - dnsmasq:
Yes. Plus one line for my second subnet
You should not need any additional lines for subnets since the dnsmasq service only provides pxe boot information.
So is there any logic to when and where these computers fail to pxe boot?
Does it fail more often on one subnet than the other?
It would be ideal from the capture standpoint if you could capture a pxe boot failure on a computer that is on the same subnet as the FOG server, because we will use the fog server and tcpdump to capture the entire pxe boot process (wireshark not needed). I have the instructions here: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
what is important is the capture filter so we only get pxe booting information and not all of the other stuff flying down your network. So the ports of interest are 67, 68, 69, 4011. Now if you use a witness computer with wireshark you will only see traffic for ports 67 and 68 because the protocols for 69 and 4011 are point to point, that is why its best to use the FOG server to capture when you can.
-
@george1421 now its working just fine for some reason…I didn’t make any changes.
-
@jhumpf said in Clients will not consistently boot into PXE environment - dnsmasq:
now its working just fine for some reason
OK great, now don’t change anything and you should be fine…
I’m glad you have it sorted out. Not sure if its really fixed or kind of working but we’ll take a win whenever we can get one.
-
@george1421 so now, sometimes it will say “please enter tftp server”
So far it only seems to do it if 20+ pcs reboot and once. And at that point many of them do it. Always random though. -
@jhumpf I’m almost of a mind that you have more than one dhcp server or the target computer isn’t seeing the dnsmasq response.
-
@george1421 I removed the second line for the other subnet and will try again when i get to work monday to see if that fixed it.
The line was a second dhcp-range -
@george1421 I removed the second dhcp range line and now clients on the second subnet won’t boot at all
-
@jhumpf I can say from past experiences that you only need the dhcp pool statement for the local subnet to where dnsmasq is running. The only other thing you need to do is to update your dhcp-relay/helper service on your (vlan) router to include the dnsmasq server as the last helper in the list. This way dnsmasq server is informed of the remote client requesting a dhcp address and allow it to respond to the dhcp request with the ProxyDHCP response.
edit: Wait a minute lets make sure we are talking about the same setting. If you used my dnsmasq configuration exactly this should be your last line.
dhcp-range=<fog_server_ip>,proxy
You only need to add in the fog server’s IP a comma and “proxy”. Don’t define any other values that what I have in my configuration. You don’t want dnsmasq turning into a full dhcp server here.
-
@george1421 so I now have the line you specified end with “,255.255.0.0” because without it, the second subnet wont pxe boot.
When I restart our lab of 26 machines all at once, all but 3 stop at “enter tftp server”
Our other lab I attempted a restart all at the same time but it did not seem to have an issue but the PCs were just slightly more staggered on the restart. But I still believe that this shouldn’t have been a problem in either lab. -
@jhumpf All I can say is that based on the two to three years of using that configuration for different fog admins, we haven’t needed to deviate from that configuration since dnsmasq is not giving out dhcp addresses the mask “should be” irrelevant. The remote subnets should just work as long as dnsmasq is being informed of them requesting a dhcp address.
If you want to debug deeper into the matter, I would suggest that you reset the configuration back to what I have in the article then follow this procedure to see if we can capture a bad remote proxydhcp request and then capture a good remote proxydhcp request so we can compare. There also may be some value in taking a computer on the remote “bad” subnet and loading wireshark on it. Start the capture filter of port “67 or port 68”. That would capture what the dhcp and dnsmasq are sending to the target computer, from the target computer’s point of view. https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
Upload those 3 captures to a share file site and share them as public. Either DM me directly or post the links here and I will review them.
My gut feeling is that you have something not configured right in your infrastructure not related to FOG or dnsmasq.