FOG DHCP problems with possible printer interference?
-
Server
- FOG Version: 1.3.0-RC-37
- SVN Revision: 6049
- OS: Ubuntu 14.04
- DHCP is running on: Windows Server 2008 R2 Standard
Hi Fog Community! Happy Holidays!
I’ve scoured the forums in hope for an answer, but I’m coming up short.
My FOG server has no problem with some computers on our domain, but I keep getting this error on most of them that are in a different part of the building:
I looked up that error on the ipxe.org website, and it told me to run a couple commands in the iPXE shell:
ifconf
and
ifconf -c dhcp net0
After I ran these commands, I get these positive responses:
So when I type ‘exit’ in the iPXE shell, it exits out and immediately boots to the FOG boot menu! Amazing!!
I may have saw 1 other post on the forums where someone had a similar problem with these DHCP issues, and found that one of their Xerox printers was the problem. We do have multiple Xerox printers in our building, including 1 that is close to the computer in the pictures I provided.
Any idea what is going on here?
-
What is DHCP option 067 set to in your dhcp server? If it’s not
undionly.kkpxe
(note the two Ks), then set it as this and give it a whirl. -
Hey Wayne! Thanks for the reply.
I originally had the 067 option set to undionly.kpxe, but I just tried your suggestion and it did work on that computer.
BUT, I tried booting from a computer next to the cubicle of the computer that worked, and I got a different error this time: No DHCP or proxyDHCP offers were received.
Looks like we’re getting much closer. Any idea what the next step would be?
-
@afriedman that would lead me one of two things.first a rogue DHCP server, second there is something separating the network between those two systems.
-
This appears to be a spanning tree issue to me. Initially the workstation gets a DHCP address because it’s pxe booting and the iPXE kernel makes it to the target, but that’s when things goes sideways. iPXE can not pick up a dhcp address and it fails. BUT if you issue a few commands in the iPXE kernel you have the FOG menu.
What is happening here is that PXE boots, and then when the iPXE kernel starts up it winks (monetarily turns off and on the network link) which causes the switch to start the spanning tree counter again. The port will stay in a listening state for 27 seconds then start forwarding data. To lighting fast FOG, 27 seconds is an eternity. FOG has already given up and gone to sleep by the time STP starts forwarding data. This is a function of the switch and not the PC or any of FOG’s sub-components.
A quick check for spanning tree issues is to just put a dumb (unmanaged) switch between the building switch and the pxe booting computer. If the target computer boots to the fog menu then you found the issue.
Now fixing the issue, you need to turn on one of the fast STP protocols like (fastSTP, portfast, RSTP) to eliminate this issue while keeping the benefits of spanning tree enabled.
-
Hey Tom.
“A rogue DHCP server”. I don’t see a second DHCP server anywhere on our network, that I’m aware of. Is wireshark one of the only ways to see if there’s a rogue/secret DHCP server hiding somewhere?
I think there’s a switch in between, but if I even find that switch, would I have to do any type of forwarding with that?
-
@afriedman said in FOG DHCP problems with possible printer interference?:
Hey Tom.
“A rogue DHCP server”. I don’t see a second DHCP server anywhere on our network, that I’m aware of. Is wireshark one of the only ways to see if there’s a rogue/secret DHCP server hiding somewhere?
Yes, wireshark will tell you this. Since dhcp is broadcast traffic, you just need to attach your wireshark computer the subnet where your target computer is and then set your filters for
port 67 and port 68
then pxe boot your target computer. You will see a “DHCP Offer” packets from all of the dhcp servers that can hear the initial client dhcp request.But in this case I still think its a spanning tree issue.
-
@george1421 Spanning tree issue? Please explain.
-
@afriedman said in FOG DHCP problems with possible printer interference?:
@george1421 Spanning tree issue? Please explain.
I thought I did in my first post??
If you are not using a fast spanning tree protocol the switch port won’t start transmitting data until 27 seconds after the link comes up.
-
@Joe-Schmitt I think that Sebastian was working on one too using node-js. I’m not sure if is the same one you were working on or not.
-
@george1421 sorry I didn’t see your original post - ill look at that.
@Joe-Schmitt sounds good. I’ll be waiting for your answer.
-
@Joe-Schmitt Thank you very much for this program.
When I run this program, I should then turn on a different machine in the same area and look at the results on the computer where im running your program?
-
@afriedman Nope, the program will simulate a computer booting up requesting PXE information and capture who responds and with what.
-
@Joe-Schmitt Ahhhhhh okay. I’ll let you know the results very soon!
-
@Joe-Schmitt Side note: it will need to capture at least two if not more offers from dhcp servers. If we are running dnsmasq you will get two offers right away one from the dhcp server and one from the dhcpProxy server.
-
The bottom image is the first half, and the top image is the second half.
-
@Joe-Schmitt You weren’t expecting that outcome? Lol interesting.
-
@Joe-Schmitt Oh alright. Well it’s a pretty neat program.
@george1421 I’m going to try to talk to Cisco Technical Support either today or tomorrow about having them remote into our switch and turn on one of the fast STP protocols. I’ll let you know the results, unless you’d prefer I do something else before talking to Cisco.
-
@afriedman As I said, if you place the dumbest switch you can find (that’s still functional) between your cisco switch and the target computer. Then pxe boot the target computer, if you can get to the fog menu where you couldn’t without the dumb switch, then its most likely a spanning tree issue.
I can say typically they would turn on one of the fast STP protocols by default (just for this reason). There have been documented cases of target computers not getting dhcp addresses because of this.
-
@afriedman We’ll just taking with Joe through chat. What he’s seeing and what I thought I say was too different things.
It would be helpful if you can capture a pcap of the pxe booting process.
Please do the following (assuming your fog server, dhcp server, and pxe booting clinet are on the same subnet).
- Install tcpdump on your fog server
- Launch the tcpdump program with this command
tcpdump -w output.pcap port 67 or port 68 or port 69 or port 4011
- PXE boot the target computer until you get the error
- Press ctrl-c to exit out of the tcpdump program
- Upload the pcap file here for review.