FOG 1.5.2 TFTP OpenTimeout
-
@jvenus The pcap is broken. On your side you can open it with wireshark and see right away its borked.
A good pcap will have a minimum of DHCP 4 packets and then 2 tftp packets.
-
@jvenus I find the syslog interesting because of 2 errors.
- No route to host. that would imply a network routing issue. Is the pxe booting computer and FOG server on the same subnet?
- Bind failed port already in use. This means that there is already a tftp server running and the one starting by xinetd is failing to connect to port 69 because something is already using that port. Did you manually start the tftp server on debian during testing?
-
-
Yes. Each machine is on a 192.168.190.0/24. Fogserver is 190.100 (static); CentOS is 190.182 (dhcp); Win10 host is 190.42 (dhcp)
-
No manual starting other than what the documentation in the first link prescribed (https://wiki.fogproject.org/wiki/index.php/Tftp_timeout) - which is where I think that error is coming from due to its timestamp but could definitely be wrong.
-
I’ve run the tcpdump multiple times, all saying the same thing as far being cut off in the middle of a packet. Virtualbox promiscuous mode is set to ‘allow all’ on each vm’s adapter. Here’s the most recent output, but again, still broken - output.pcap
-
-
@jvenus Is your fog server also running under virtual box? Its rare that tcpdump would mess up the pcap like this.
You can use wireshark on a witness computer with a little loss of clarity on what is going on. With wireshark you want to use the capture filter of
port 67 or port 68
we won’t see any tftp transfers but at least we can see what the dhcp server is telling the target computer. -
-
Yes. Fogserver is Debian 10 VM with the same Virtualbox that runs CentOS.
-
I used Wireshark on the Win10 host and this is all that was captured - win10_pcap.pcap
-
-
@jvenus OK I see the problem and have a solution for you.
The issue is PXE booting involves 2 locations where the pxe boot info needs to exist. Many routers as dhcp servers don’t get this right. PXE booting sometimes involves 2 protocols bootp and dhcp. In the pcap you provided the bootp fields are not filled out but dhcp options 66 and 67 are. The bootp fields are in the ethernet header as {next-server} and {boot-file}. Its kind of a toss up which fields a PXE rom will look at and most dhcp servers just automatically fill out both. So look at your dhcp server settings, see if there is something that mentions bootp and turn it on. If there isn’t an option then lets install dnsmasq on your FOG server to supplement the missing pxe booting information.
I have a tutorial on installing dnsmasq here: https://forums.fogproject.org/topic/12796/installing-dnsmasq-on-your-fog-server
TBH I have not tried to install dnsmasq under debian but the flow and configuration should be the same.
-
The ‘boot-file’ DHCP option in the Sophos UTM 9 is now set. The ‘next-server’ is not wanting to play nice. I’ll give that a kick and try to get that set and then run another capture. I’ll install the dnsmasq if that doesn’t work. It may be a few minutes before I know.
Thanks
-
Ok. Moderate success with the ‘next-server’. I set it to 192.100.
-
PXE boot doesn’t time out, so yay! But, now it gives this. (I hit “s” to enter PXE shell.) I put the ‘boot-file’ option the same file name as the option 67 (bootfile-name) which is ‘undionly.kpxe’. Those should be the same, no?
-
Wireshark capture - no_confi_method.pcap
-
-
@jvenus OK give me a minute to digest the pcap. But at this point where we typically see this fail (not in your case for vb) is that spanning tree is enabled and the port is not forwarding data yet.
So dhcp process is working because ipxe makes it to the target computer then ipxe startup. its seeing the network adapter because we see the mac address, but its not receiving the dhcp packets, in ipxe. I think at the command prompt key in
dhcp net0
and it should query again. -
Running
dhcp net0
gives the same result as in the screengrab. I’ll wait for more info when you get the time. Thanks so much. -
@jvenus I’m still looking into this, but looking at the pcap I can see what its doing.
For PXE part that is 100% good and the tftp is OK. iPXE starts up just fine then issues a dhcp. Now if you look at the pcap you see this cycle of discover offer discover offer and so on. What is going on here the discover asks for certain fields from the dhcp server and the fields returned from dhcp are not sufficient so iPXE queries again and the cycle repeats. What’s missing I don’t know yet. I’m getting a side by side setup.
-
@jvenus I’m not finding a smoking gun here, but something is missing from the dhcp request that iPXE feels it needs.
-
@george1421 Whatever information I can give, just let me know.
-
@jvenus Well I don’t know the answer. The ONLY thing I can see different between one that works and yours is that you are sending dhcp option 28 (broadcast address) to the target computer where normally its not sent.
-
@george1421 I’m going to try to get it working on Ubuntu 20.04 LTS this weekend. I’ll keep an eye on this post as well as let you know if there is any difference in this attempt.
-
@jvenus A different host OS isn’t the issue, its the dhcp server that seems to be the root of the issue. I just can’t find the different.
The discrepancy is between ipxe and your dhcp server. BUT what I might do is try a physical machine instead of VB. We have seem some strangeness in the way VB does things especially in regards to pxe booting. BUT I’m still conflicted, at this point iPXE should be in control of the network and VB not blocking things.
-
@george1421 I have an old machine I can throw linux on. I’ll do that this weekend instead of the Ubuntu VM.
-
@george1421 Interesting find…
I decided to attempt a pxe boot from that spare physical machine to the fogserver VM before installing / configuring FOG server on the physical machine and continuing testing to it instead of the current VM. Attached is the Wireshark output from the Win10 host while booting from a physical machine to the Debian ‘fogserver’ VM. Again, all three are in the same subnet. It seems like it passes all of the information just fine. The physical machine will boot into the FOG menu.
-
@jvenus See this is what I thought sometimes VB can be flaky when PXE booting. This is why I asked to test with a physical machine. From the pcap it appears the pxe booting process is normal. You see one set of dhcp requests for the PXE rom and then a second set for iPXE. If you were to boot into FOS Linux (by picking one of the registration functions) you would have seen a third dhcp sequence.
I remembered there was something we had to do to get around VB’s strangeness. A quick google-fu search of the forums suggest that you use ipxe.kpxe instead of unionly.kpxe with virtual box VMs. undionly.kpxe is a shim driver that uses the network card’s built in undi driver. So its very small and fast. If the built in undi driver in the network card is not behaving well FOG has the ipxe boot loaders that have all of the common network drivers built in. In this case ipxe.kpxe and the (recommended for general use) ipxe.efi for uefi systems. For bios mode you can typically use undionly.kpxe but if that driver doesn’t work then the larger ipxe.kpxe boot loader will work.
https://forums.fogproject.org/topic/10160/virtualbox-pxe-boot-no-configuration-methods-succeeded
-
@george1421 I can’t thank you enough for all of your help. I’ll switch over to the ipxe.kpxe bootfile instead for general use from now on. Should I have any problems, I’ll definitely let you know. But for now, many many thanks.