FOG 1.5.2 TFTP OpenTimeout
-
Hey all,
I’ve read through many of the forum posts and wiki and am still stumped. I’ll do my best to give as much background as possible, but I’m sure more will be requested and I’ll supply that.
Scenario -
- Windows10 Enterprise host (192.168.190.40) running VirtualBox 6.1.
- Debian 10 Buster VM (minimal install from cd iso with updates from debian.org during install- ssh server, system utilities, etc.), “fogserver”, 192.168.190.100) with FOG 1.5.8 default install
- CentOS 7 VM (192.168.190.182) set up just as a test for pxe boot
- sophos utm 9 firewall/router w/ DHCP options 66/67 set to 192.168.190.100 / undionly.kpxe respectively
- both win10 host and centos7 VM can ssh into fogserver
- neither win10 host nor centos7 VM can tftp to fogserver; win10 times out (connect request failed); centos7vm says it connects, but downloads a 0 length file
- I’ve gone through https://wiki.fogproject.org/wiki/index.php/Tftp_timeout… and everything matches from what I can see (config files, iptables, etc… no selinux installed)
- I’ve run tcpdump per https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue and will attach output.pcap. This was taken from attempting to pxe boot the centOS7 vm (192.168.190.182) and hit ctrl+C after 20 seconds once “pxe-e32: tftp open timeout” error appeared on centos7vm. The output definitely doesn’t seem right, so I did it a second time and got the exact same thing.
Any screenshot or config or dump file you want to see, please let me know. Each VM is using bridged adapter so it has access directly to the Sophos. Any help is appreciated. Thanks in advance.
-
Try capturing the pcap again because it is broken/damaged.
Second for windows tftp make sure you disable the windows firewall during testing since tftp much like ftp 2 communication channels are created.
Make sure if there is a debian firewall enabled, its turned off on your fog server as well as selinux set to permissive or disabled.
Test to see if the fog server can tftp to itself using the defined network interface.
I do have it in my queue to spin up a ubuntu 20.04 fog server tonight because another FOG admin is having a similar issue as yours but on ubuntu 20.04. I’m not saying is the same issue only similar appearance.
See if there is anything in the /var/logs directory that will help debug.
grep -r tftp /var/logs
-
Thanks for the quick reply
- re-ran tcpdump (tcpdump -w output2.pcap port 67 or port 68 or port 69 or port 4011) – same issue, will attach again for posterity (output2.pcap )
- TFTP tcp/udp inbound rules already enabled, but disabled firewall for testing – same result (“connect request failed”);
- fogserver can tftp to itself
- syslog as requested -syslog_tftp.txt
I’m going to try on ubuntu 20.04 tonight as well. I’m not pinned down to Debian for this. Again, thanks.
-
@jvenus The pcap is broken. On your side you can open it with wireshark and see right away its borked.
A good pcap will have a minimum of DHCP 4 packets and then 2 tftp packets.
-
@jvenus I find the syslog interesting because of 2 errors.
- No route to host. that would imply a network routing issue. Is the pxe booting computer and FOG server on the same subnet?
- Bind failed port already in use. This means that there is already a tftp server running and the one starting by xinetd is failing to connect to port 69 because something is already using that port. Did you manually start the tftp server on debian during testing?
-
-
Yes. Each machine is on a 192.168.190.0/24. Fogserver is 190.100 (static); CentOS is 190.182 (dhcp); Win10 host is 190.42 (dhcp)
-
No manual starting other than what the documentation in the first link prescribed (https://wiki.fogproject.org/wiki/index.php/Tftp_timeout) - which is where I think that error is coming from due to its timestamp but could definitely be wrong.
-
I’ve run the tcpdump multiple times, all saying the same thing as far being cut off in the middle of a packet. Virtualbox promiscuous mode is set to ‘allow all’ on each vm’s adapter. Here’s the most recent output, but again, still broken - output.pcap
-
-
@jvenus Is your fog server also running under virtual box? Its rare that tcpdump would mess up the pcap like this.
You can use wireshark on a witness computer with a little loss of clarity on what is going on. With wireshark you want to use the capture filter of
port 67 or port 68
we won’t see any tftp transfers but at least we can see what the dhcp server is telling the target computer. -
-
Yes. Fogserver is Debian 10 VM with the same Virtualbox that runs CentOS.
-
I used Wireshark on the Win10 host and this is all that was captured - win10_pcap.pcap
-
-
@jvenus OK I see the problem and have a solution for you.
The issue is PXE booting involves 2 locations where the pxe boot info needs to exist. Many routers as dhcp servers don’t get this right. PXE booting sometimes involves 2 protocols bootp and dhcp. In the pcap you provided the bootp fields are not filled out but dhcp options 66 and 67 are. The bootp fields are in the ethernet header as {next-server} and {boot-file}. Its kind of a toss up which fields a PXE rom will look at and most dhcp servers just automatically fill out both. So look at your dhcp server settings, see if there is something that mentions bootp and turn it on. If there isn’t an option then lets install dnsmasq on your FOG server to supplement the missing pxe booting information.
I have a tutorial on installing dnsmasq here: https://forums.fogproject.org/topic/12796/installing-dnsmasq-on-your-fog-server
TBH I have not tried to install dnsmasq under debian but the flow and configuration should be the same.
-
The ‘boot-file’ DHCP option in the Sophos UTM 9 is now set. The ‘next-server’ is not wanting to play nice. I’ll give that a kick and try to get that set and then run another capture. I’ll install the dnsmasq if that doesn’t work. It may be a few minutes before I know.
Thanks
-
Ok. Moderate success with the ‘next-server’. I set it to 192.100.
-
PXE boot doesn’t time out, so yay! But, now it gives this. (I hit “s” to enter PXE shell.) I put the ‘boot-file’ option the same file name as the option 67 (bootfile-name) which is ‘undionly.kpxe’. Those should be the same, no?
-
Wireshark capture - no_confi_method.pcap
-
-
@jvenus OK give me a minute to digest the pcap. But at this point where we typically see this fail (not in your case for vb) is that spanning tree is enabled and the port is not forwarding data yet.
So dhcp process is working because ipxe makes it to the target computer then ipxe startup. its seeing the network adapter because we see the mac address, but its not receiving the dhcp packets, in ipxe. I think at the command prompt key in
dhcp net0
and it should query again. -
Running
dhcp net0
gives the same result as in the screengrab. I’ll wait for more info when you get the time. Thanks so much. -
@jvenus I’m still looking into this, but looking at the pcap I can see what its doing.
For PXE part that is 100% good and the tftp is OK. iPXE starts up just fine then issues a dhcp. Now if you look at the pcap you see this cycle of discover offer discover offer and so on. What is going on here the discover asks for certain fields from the dhcp server and the fields returned from dhcp are not sufficient so iPXE queries again and the cycle repeats. What’s missing I don’t know yet. I’m getting a side by side setup.
-
@jvenus I’m not finding a smoking gun here, but something is missing from the dhcp request that iPXE feels it needs.
-
@george1421 Whatever information I can give, just let me know.
-
@jvenus Well I don’t know the answer. The ONLY thing I can see different between one that works and yours is that you are sending dhcp option 28 (broadcast address) to the target computer where normally its not sent.
-
@george1421 I’m going to try to get it working on Ubuntu 20.04 LTS this weekend. I’ll keep an eye on this post as well as let you know if there is any difference in this attempt.
-
@jvenus A different host OS isn’t the issue, its the dhcp server that seems to be the root of the issue. I just can’t find the different.
The discrepancy is between ipxe and your dhcp server. BUT what I might do is try a physical machine instead of VB. We have seem some strangeness in the way VB does things especially in regards to pxe booting. BUT I’m still conflicted, at this point iPXE should be in control of the network and VB not blocking things.
-
@george1421 I have an old machine I can throw linux on. I’ll do that this weekend instead of the Ubuntu VM.