Clients will not consistently boot into PXE environment - dnsmasq

jhumpf

I need some urgent help.
I am using dnsmasq so we do not have to alter our DHCP server. It is available to both subnets.
I am unsure if this inconsistency is on both subnets because there are not enough clients on the one to do accurate testing.

Sometimes the clients boot right into the PXE environment, much of the time they will give the following errors:
PXE-E52: proxyDHCP offers were received. No DHCP offers were received.
PXE-E55 ProxyDHCP did not reply to request on port 4011

This does work sometimes but it is REALLY inconsistent and yesterday took dozens of reboots and many failed attempts for them all to come up for multicast and unicast when I do them separately.

I have the exact config for dnsmasq as the install for CentOS 7 FOG wiki says to have.

george1421

Did you use my dnsmasq configuration file to setup dnsmasq on your FOG server or some other configuration file?

https://forums.fogproject.org/topic/12796/installing-dnsmasq-on-your-fog-server

Did you add the dnsmasq server’s IP address to your dhcp-relay/helper service running on your subnet router?

george1421

@jhumpf said in Clients will not consistently boot into PXE environment - dnsmasq:

Sometimes the clients boot right into the PXE environment, much of the time they will give the following errors:
PXE-E52: proxyDHCP offers were received. No DHCP offers were received.
PXE-E55 ProxyDHCP did not reply to request on port 4011

This would indicate that dnsmasq DID respond because it knows there is a proxyDHCP server. It would only know that because it was told there was one.

Does this issue happen on one subnet more than another?

Do you have more than 1 dhcp server on your network?

Finally, can you successfully predict when this error will happen, such as if we setup wireshark we could capture it not working?

jhumpf

@george1421 said in Clients will not consistently boot into PXE environment - dnsmasq:

@jhumpf said in Clients will not consistently boot into PXE environment - dnsmasq:

Sometimes the clients boot right into the PXE environment, much of the time they will give the following errors:
PXE-E52: proxyDHCP offers were received. No DHCP offers were received.
PXE-E55 ProxyDHCP did not reply to request on port 4011

This would indicate that dnsmasq DID respond because it knows there is a proxyDHCP server. It would only know that because it was told there was one.

Does this issue happen on one subnet more than another?

Do you have more than 1 dhcp server on your network?

Finally, can you successfully predict when this error will happen, such as if we setup wireshark we could capture it not working?

I am not positive about the multiple subsets. I believe so but I cant be certain.
I can ask about multiple dhcp servers. And I’m not sure about predictions. We probably can capture it with wireshark.
I tried doing it from the server and I didn’t see anything noteworthy but maybe I’m missing something

george1421

@jhumpf Did you use my config file referenced in the first link to setup your dnsmasq server?

jhumpf

@george1421 said in Clients will not consistently boot into PXE environment - dnsmasq:

@jhumpf Did you use my config file referenced in the first link to setup your dnsmasq server?

Yes. Plus one line for my second subnet

george1421

@jhumpf said in Clients will not consistently boot into PXE environment - dnsmasq:

Yes. Plus one line for my second subnet

You should not need any additional lines for subnets since the dnsmasq service only provides pxe boot information.

So is there any logic to when and where these computers fail to pxe boot?

Does it fail more often on one subnet than the other?

It would be ideal from the capture standpoint if you could capture a pxe boot failure on a computer that is on the same subnet as the FOG server, because we will use the fog server and tcpdump to capture the entire pxe boot process (wireshark not needed). I have the instructions here: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue

what is important is the capture filter so we only get pxe booting information and not all of the other stuff flying down your network. So the ports of interest are 67, 68, 69, 4011. Now if you use a witness computer with wireshark you will only see traffic for ports 67 and 68 because the protocols for 69 and 4011 are point to point, that is why its best to use the FOG server to capture when you can.

jhumpf

@george1421 now its working just fine for some reason…I didn’t make any changes.

george1421

@jhumpf said in Clients will not consistently boot into PXE environment - dnsmasq:

now its working just fine for some reason

OK great, now don’t change anything and you should be fine…

I’m glad you have it sorted out. Not sure if its really fixed or kind of working but we’ll take a win whenever we can get one.

jhumpf

@george1421 so now, sometimes it will say “please enter tftp server”
So far it only seems to do it if 20+ pcs reboot and once. And at that point many of them do it. Always random though.

george1421

@jhumpf I’m almost of a mind that you have more than one dhcp server or the target computer isn’t seeing the dnsmasq response.

jhumpf

@george1421 I removed the second line for the other subnet and will try again when i get to work monday to see if that fixed it.
The line was a second dhcp-range

jhumpf

@george1421 I removed the second dhcp range line and now clients on the second subnet won’t boot at all

george1421

@jhumpf I can say from past experiences that you only need the dhcp pool statement for the local subnet to where dnsmasq is running. The only other thing you need to do is to update your dhcp-relay/helper service on your (vlan) router to include the dnsmasq server as the last helper in the list. This way dnsmasq server is informed of the remote client requesting a dhcp address and allow it to respond to the dhcp request with the ProxyDHCP response.

edit: Wait a minute lets make sure we are talking about the same setting. If you used my dnsmasq configuration exactly this should be your last line.

dhcp-range=<fog_server_ip>,proxy

You only need to add in the fog server’s IP a comma and “proxy”. Don’t define any other values that what I have in my configuration. You don’t want dnsmasq turning into a full dhcp server here.

jhumpf

@george1421 so I now have the line you specified end with “,255.255.0.0” because without it, the second subnet wont pxe boot.
When I restart our lab of 26 machines all at once, all but 3 stop at “enter tftp server”
Our other lab I attempted a restart all at the same time but it did not seem to have an issue but the PCs were just slightly more staggered on the restart. But I still believe that this shouldn’t have been a problem in either lab.

george1421

@jhumpf All I can say is that based on the two to three years of using that configuration for different fog admins, we haven’t needed to deviate from that configuration since dnsmasq is not giving out dhcp addresses the mask “should be” irrelevant. The remote subnets should just work as long as dnsmasq is being informed of them requesting a dhcp address.

If you want to debug deeper into the matter, I would suggest that you reset the configuration back to what I have in the article then follow this procedure to see if we can capture a bad remote proxydhcp request and then capture a good remote proxydhcp request so we can compare. There also may be some value in taking a computer on the remote “bad” subnet and loading wireshark on it. Start the capture filter of port “67 or port 68”. That would capture what the dhcp and dnsmasq are sending to the target computer, from the target computer’s point of view. https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue

Upload those 3 captures to a share file site and share them as public. Either DM me directly or post the links here and I will review them.

My gut feeling is that you have something not configured right in your infrastructure not related to FOG or dnsmasq.

Clients will not consistently boot into PXE environment - dnsmasq

135

12.6k

17.5k

156.3k