Not able to PXE boot from FOGServer on Proxmox LXC with proxyDHCP
-
Hello dear forum members!
After extensive trial and error, googling and searching through the wiki and forum I seem to be stuck with the following problem:
I am not able to boot from a fogproject bootserver via pxe over a network (containing switches, DHCP-Servers and DNS-Servers I do not control) while manually grabbing the undionly.kpxe via tftp is working fine.
The circumstances:
Since I do not have control over the DHCP and DNS Server or the switches of the underlying infrastructure I went with the proxyDHCP solution for propagating the server-adress and filename. I followed george1421 tutorial to setup dnsmasq and dnsmasq is starting with the system and works on runtime.The fogserver runs in a debian LXC on top of a proxmox host and has been installed without problems with the options provided in this wiki post. The Webinterface works as intended. It has its own static IP-Adress in the same C-Level subnet. Additionally apparmor has been modified as described in jburleson’s Post.
I am concerned the switch settings might not allow me to send those requests to anywhere else than the DHCP-Server and the fogserver never gets the question for the server and filename. Could this be the case or am I totally wrong there? Before I build an own isolated seperate network for testing this suspicion I wanted to hear some opinions. Also, if capturing the network traffic, what should I look out for here?
I analysed TCP-Dumps when:
- Trying to boot over pxe -> No TFTP packages coming in from the MAC-Adress of the client
- Grabbing undionly.kpxe via a Windows-Client with disabled Firewall -> Successful filetransfer of the file and tftp negotiation visible in the tcpdump
Thanks in advance for anyone trying to help me, it’s greatly appreciated! If you need any further info, please ask!
Version Infos:
- Proxmox VE: 5.2-1 (on Linux pve 4.15.17-1-pve #1 SMP PVE 4.15.17-9 x86_64 GNU/Linux)
- Fogserver: Version 1.5.6
- Installed in an LXC with Debian 9 Stretch
- dnsmasq: 2.76-5+deb9u2
-
@DarKFeeliN Well I think attempting to capture the dhcp process from a witness computer (i.e. laptop plugged into same switch as your vm host server) will tell us where to look next. That witness computer running wireshark should pick up both sides of the conversation. Remember the complete dhcp process is Discover->Offer->Request->ACK and then some time later Inform. If we see the Discover through ACK then it was successful.
-
@DarKFeeliN said in Not able to PXE boot from FOGServer on Proxmox LXC with proxyDHCP:
I am concerned the switch settings might not allow me to send those requests to anywhere else than the DHCP-Server
If the dhcp server, dnsmasq service and pxe booting client are all on the same subnet then it should just work with my configuration. If the dhcp server / dnsmasq and pxe booting clients are on a different subnet then you will need to modify your dhcp-helper service on your subnet router. You need to add the dnsmasq (fog) server as the last dhcp server in the dhcp-helper/dhcp-relay service list.
So lets start out with is everyone on the same subnet?
-
Hi george1421, thanks for your answer.
The Gateway, the fogserver and the clients are on the same subnet, the main DHCP-Server and DNS-Server are not.
It looks like this:- Fogserver with proxyDHCP/dnsmasq: x.y.42.27 (static IP)
- Gateway: x.y.42.254
- DHCP-Server: x.y.200.56
- DNS-Server: x.y.200.1
Does that pose a problem? It can also be assumed, that we most likely won’t be allowed to make changes to the DHCP-Server or the gateway switch.
To add to that: The IP-Adresses are all global IPs, no local IPs (they bought a whole B-Class-Subnet here many years ago). I don’t think this correlates with the problem, but I don’t want to let out any information. Everything network related works the same way as it does as in a private network. Special filtering only happens onto the “Outside-World” of the B-Class-Subnet and the DNS-Server resides within.
Edit after 22 Minutes of answering: I double checked and found out the DHCP-Server ist not actually on the same subnet. I edited the above information.
-
@DarKFeeliN OK if the fog server, dnsmasq server, and pxe booting clients are on the same subnet then dnsmasq will (should) work without modifying your networking infrastructure. dhcp works through broadcast messages. If the dnsmaq server was on the same subnet as your dhcp server then you must update the networking infrastructure.
Now to understand why its not working. I want you to follow this tutorial. Since you have public IP addresses upload the pcap to a share file site (like google drive) and then IM me the link and I’ll take a look at it. The pcap will only contain pxe boot information if you follow my capture filter. https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
If you want to look at the pcap in windows you can use wireshark. You should see the client send a dhcp discover packet and then both the main dhcp server as well as dnsmasq sending a dhcp offer packet. As long as both offers are received by the target computer then it should work. There are a few other things to look out for so I need to see the pcap file.
-
@george1421 Thanks, this gives me a little hope.
I followed your suggestion but unfortunately I haven’t been able to record a single packet from port 67, 68, 69 or 4011 for the duration of the time I had a client try to boot from PXE for both the fogserver and the proxmox host. I then proceeded to use the same capture filter on my own client-PC with wireshark and there is absolute silence on those ports.
I suspect the gateway switch might suppress those broadcoasts. I will build a small local network with a switch I control, redo the tcpdumps and report if the results are different.
If you might suspect a different problem I have open ears for every suggestion. Thanks for your ongoing help.
-
@DarKFeeliN said in Not able to PXE boot from FOGServer on Proxmox LXC with proxyDHCP:
I haven’t been able to record a single packet from port 67, 68, 69 or 4011 for the duration of the time I had a client try to boot from PXE for both the fogserver and the proxmox host.
I suspect the gateway switch might suppress those broadcoasts. I will build a small local network with a switch I control, redo the tcpdumps and report if the results are different.
This might sound bad but this is a good bit of information. For dhcp to work, broadcasts MUST be supported on your network. Even if you are not picking up the other parts of the pxe booting process you MUST see the discover, offer, request, and ack/nack messages (also dhcp inform messages too) or there is no chance of any target computer getting a dhcp address.
So your fog server is running as a vm under Proxmox (I don’t know this hypervisor so I can only speak in general terms).
- If you create a VM (or have one) on this same vm host server does it pick up a dhcp address in the x.y.42.X range?
- Do you have the fog server bridged to the business network (not natted)?
- Do you need to set the vm host server interface into promiscuous mode?
- If you take a second computer as a witness computer with wireshark installed connected to the same network switch as the FOG server’s vm host do you see any dhcp traffic?
Once we see dhcp traffic then we can focus on if dnsmasq is working like it should.
-
@george1421 said in Not able to PXE boot from FOGServer on Proxmox LXC with proxyDHCP:
- If you create a VM (or have one) on this same vm host server does it pick up a dhcp address in the x.y.42.X range?
It has a static IP in the x.y.42.X range, but yes, it could also be dynamically assigned by the DHCP for the x.y.42.X range.
- Do you have the fog server bridged to the business network (not natted)?
It is bridged and has its own (public) address. I never had an issue with that on a bridged network, the webserver works flawlessly.
- Do you need to set the vm host server interface into promiscuous mode?
I just checked: Yes, the main NIC goes into promiscuous mode according to /var/log/kern.log
- If you take a second computer as a witness computer with wireshark installed connected to the same network switch as the FOG server’s vm host do you see any dhcp traffic?
As stated in my last response I sadly already did that and did not see a single package. I will also try a few different clients with different pxe-Versions but I think it might be safe to assume there is no broadcoasting allowed for those ports.
Just to make extra sure I will test this exact VM in an isolated network with a switch I control and know how the firewall and routes are setup and see, if the tcpdump then contains packages on those ports. If this is the case I will get in touch with the networking team (IT here is hierarchical to some degree, the network infrastructure is not in our own hands) and ask if they can make an exception for the broadcoasts of those ports to the specific IP of my fogserver. But I really have to state my request very specific so I want to make sure I do not ask for anything I do not need or have an error on my side of the configuration first.
Once we see dhcp traffic then we can focus on if dnsmasq is working like it should.
As soon as this is the case (or I make a new discovery) I will get back to you! Thank you for all your answers so far, they really lead me in the right direction.
-
@DarKFeeliN said in Not able to PXE boot from FOGServer on Proxmox LXC with proxyDHCP:
It has a static IP in the x.y.42.X range, but yes, it could also be dynamically assigned by the DHCP for the x.y.42.X range.
Can you confirm that another VM on this same vm host as the FOG server does receive a dhcp address?
You could even just create a test vm that does a pxe boot i bios mode. that would tell us if it gets a dhcp address. Do this while running tcpdump on the FOG server. You “should” see the dhcp communication (hopefully). -
@george1421 I just spun up a Windows VM and changed it to get its IP-Address from DHCP and it worked. I should add that the MAC-Address has to be whitelisted first, if it is supposed to get a DHCP-Address but every MAC-Address that is involved with the fogserver is already whitelisted.
I will send the tcp-dump to you (you can see 5 Discover packets, a request-packet and a inform packet like you would expect).
-
@DarKFeeliN said in Not able to PXE boot from FOGServer on Proxmox LXC with proxyDHCP:
I should add that the MAC-Address has to be whitelisted first
Are you running some kind of NAC/NAP in your environment? Or do you have a high security network (i.e. DoD)?
-
@george1421 said in Not able to PXE boot from FOGServer on Proxmox LXC with proxyDHCP:
Are you running some kind of NAC/NAP in your environment? Or do you have a high security network (i.e. DoD)?
I would not call it a high security network as the network is partly public (university) but the whitelisting is in place so no student can just plug in their devices. There also is filtering to the outside and the inside of the network but this is clearly communicated and does not (or maybe I should say “should not”) interfere within the “local” network.
-
@DarKFeeliN Well I think attempting to capture the dhcp process from a witness computer (i.e. laptop plugged into same switch as your vm host server) will tell us where to look next. That witness computer running wireshark should pick up both sides of the conversation. Remember the complete dhcp process is Discover->Offer->Request->ACK and then some time later Inform. If we see the Discover through ACK then it was successful.
-
@george1421 said in Not able to PXE boot from FOGServer on Proxmox LXC with proxyDHCP:
@DarKFeeliN Well I think attempting to capture the dhcp process from a witness computer (i.e. laptop plugged into same switch as your vm host server) will tell us where to look next. That witness computer running wireshark should pick up both sides of the conversation. Remember the complete dhcp process is Discover->Offer->Request->ACK and then some time later Inform. If we see the Discover through ACK then it was successful.
I already did that twice but just to make sure I ran it again for over an hour to witness every packet I can get regarding port 67 or port 68 or port 69 or port 4011.
I have only been able to see DHCP-Requests from the client that was running wireshark and ACK packets from two other PCs (in reality there were way more since there are around 200 PCs receiving IP-Addresses including some I let receive an IP-Address over DHCP that were not picked up at all).
-
@DarKFeeliN Do you set Wireshark to capture in promiscious mode?
-
@Sebastian-Roth said in Not able to PXE boot from FOGServer on Proxmox LXC with proxyDHCP:
@DarKFeeliN Do you set Wireshark to capture in promiscious mode?
I just double-checked and: Yes, I’m capturing in promiscous mode (as it is default in the Windows version).
-
@george1421
New revelations have been made. As I said I now replicated the whole setup on a local isolated network (with exception of the apparmor modifications, not necessary on a bare-metal installation).I have my witness-computer connected to both networks with 2 seperate NICs. Only one is active at a time. In both setups I first started wireshark on the witness computer and booted a notebook into pxe.
- Main network: No packets or barely some DHCP-ACKs (that were not from the booting laptop).
- Isolated local network: Discover -> Offer -> Request -> ACK. Laptop got an IP from the router and loaded the appropriate bootrom from the fogserver and booted from it.
Conclusion: Since the witnessing of the DHCP-packets have nothing to do with fog itself it is safe to say that there is some sort of broadcoastfiltering of the DHCP-relevant ports. The (almost) exact same installation worked in an isolated network but not on the main network.
Thank you so much, I finally know the exact cause of the problem and am able to proceed. I now have to write up a request if the networking team would be so kind to allow the broadcoasting of those port-packets to a single static IP that I own (I really hope they’ll allow that. Now that I think of it this makes total sense. If people are able to plug in their own devices into the network that behave like a DHCP-Server and then handle the IPs before the main DHCP does you are in a golden MITM-position and can intercept with the network-packets to your desire). But as far as fog goes, this is nothing from its side.
So I think I can say this is solved for the cause of “troubleshooting”. Even if my journey is not exactly at its end yet. Thanks again!