Some computer models adding garbage bytes to undionly.kpxe tftp filename, causing failure to PXE boot.
-
Hi all,
I’ve recently got a FOG server up and running and am having problems with some models of computer failing to PXE boot. (Specifically, Dell Vostro 220, which the wiki reports as working). I’ve had other models of computer go to the iPXE menu no problem.
On the affected PC the error is
PXE-T01: File not found
PXE-E3B: TFTP Error - File Not Found
PXE-M0F: Exiting PXE ROMDoing some digging, I ran a pcap on the FOG server’s host, and discovered that for the affected computer model, the source file name in the TFTP read request has a bunch of garbage bytes appended to it, which are not there for the computers that behave normally. I have attached an excerpt of the packet capture, showing first the TFTP traffic for a “good” computer, trimmed after the first data block, and after that the traffic for the “bad” computer.
Our environment is DHCP provided by a Unifi Security Gateway, FOG running in an Ubuntu 18.04 LXC container on Proxmox.
I updated the affected PC to the latest available BIOS and that didn’t help.
How can I resolve this problem? The idea that comes to my mind is to make a copy of undionly.kpxe named to match what the computers are requesting, but that seems like a kludge and I’m wondering if there’s any better way.
(PS: It seems like every packet is quadruplicated in the capture. I think that’s just a quirk of running tcpdump on Proxmox, I probably used the wrong tcpdump options.)
-
@EBCF Another route you might be able try if the dnsmasq gets problematic is to use a mapfile for the tftp server. Documentation for the format is here and I recall that you edited the xinitd service specification at
/etc/xinetd.d/tftp
(at least on my distribution) to get the service to use it. There is a write up on how in the workarounds for the Acer Iconia Tab w500 -
@EBCF said in Some computer models adding garbage bytes to undionly.kpxe tftp filename, causing failure to PXE boot.:
Our environment is DHCP provided by a Unifi Security Gateway
Without looking at the pcap just yet, my bet is that the dhcp server is at fault. We have seen some non-mainstream dhcp servers not ending the file name string with an ascii null character, but rely on the byte count to signal the text length. Let me look at the pcap to see how close I am.
-
Well I got shutdown before I could get started. Your pcap only contains the tftp packets. I/we need to see the dhcp packets. If your fog server, dhcp server, and pxe booting computer are on the same subnet (or in reality if your fog server and pxe booting client on the same subnet) please follow this tutorial and run it on the fog server. This will capture the dhcp/bootp as well as the tftp transfer: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
-
@EBCF From what I see in the PCAP the NIC firmware is at fault. It seems to send those characters that are not allowed in the specification. Have you tried to update the firmware on those machines yet?
-
Since the pcap I ran earlier captured everything, I’ve filtered it to cover ports 67, 68, 69, 4011, and uploaded accordingly. (I can send the entire capture privately if you need it; since I don’t know what information might be in it I don’t wish to post it publicly). It looks like you’re right that option 67 in the DHCPOffer and DHCPAck isn’t null-terminated though.
The Unifi Security Gateway runs EdgeOS which is a fork of Vyatta, and can run either ISC DHCPD or dnsmasq as the DHCP server. (There’s an option to select which but I can’t find it at the moment). If any of that helps.
For the PC, I updated to the latest BIOS. On Dell’s support site there’s no specific NIC firmware for this model that I can find. I guess I can see if Windows update turns up anything.
-
@EBCF I’m pretty sure its related to not having a null terminated string.
Some pxe boot firmware will take the byte count (in this case 0x0d) and others need a null terminated string and just ignore the byte count. Its kind of a toss up.
So what can you do? Well you can install dnsmasq on your fog server to supply the pxe boot information and then just ignore your dhcp server for pxe boot information. That is what I do at home with my soho isp router. It sends out itself as the bootp server for some stupid reason. I use dnsmasq on the fog server to override it. I have a tutorial on installing dnsmasq on the fog server if you need it.
-
@EBCF Another route you might be able try if the dnsmasq gets problematic is to use a mapfile for the tftp server. Documentation for the format is here and I recall that you edited the xinitd service specification at
/etc/xinetd.d/tftp
(at least on my distribution) to get the service to use it. There is a write up on how in the workarounds for the Acer Iconia Tab w500 -
@EBCF From what we see in the PCAP file it’s very likely the firmware is not handling the DHCP information properly. Here is the information handed out by the DHCP server:
See how the DHCP packet itself is terminated by 0xFF (DHCP option 255 to mark the end). Now when we look at the TFTP request we see that it requests the filename plus 0xFF and a couple more non ASCII characters.
My guess is that it’s not properly handling the length information (
length: 13
) given in the DHCP ACK response.So what to do about it as upgrading the firmware doesn’t seem to be available? One important thing you need to know is that the filename information is present in the DHCP answers twice:
First in the DHCP “header” (highlighted in blue) and second as DHCP option. I don’t know the DHCP spec well enough to tell you why this is the case. Though I know that some clients handle this perfectly fine and others just don’t.
So one thing you can try is adjusting the DHCP config file to remove the DHCP option if that’s possible with your Unifi Security Gateway. In ISC-DHCPD there are two different parameters for this:
filename
(DHCP header) andoption bootfile-name
(DHCP option) (reference).If that doesn’t work out I’d think @Daniel-Miller suggesting on TFTP maps is the best route to go! Try out this map rule set:
# if the requested file contains non-ASCII characters # send undionly.kpxe as default to fix Dell Vostro 220 issue e ^[a-zA-Z0-9/.\-].*$ r .* undionly.kpxe
This regular expression should allow for all common filenames including the characters
/
,.
and-
(ref). If that doesn’t work, trye ^[[:ascii:]].*$
instead (ref). -
Thanks everyone for the support.
@Daniel-Miller and @Sebastian-Roth I opted for the approach you suggested of making TFTPD ‘correct’ the filenames. The affected PC now goes to the FOG menu. I missed Sebastian’s map file and ended up writing my own which is:
# Workaround for PXE clients that misinterpret the DHCP options # because they expect a null-terminated string # match the extensions followed by any characters and replace it with # just the extensions r \.pxe.* \.pxe r \.ipxe.* \.ipxe r \.kpxe.* \.kpxe r \.kkpxe.* \.kkpxe
@george1421 I wouldn’t even have guessed that proxy DHCP was a thing. I’ve decided not to go for it this time but I’ll keep it in mind for future.