Vlan, Ipxe, DHCPNAK
-
A new (faster) model of machine made an existing timing issue worse in my environment such that the fog ipxe.efi kernel would no longer boot.
The symptom: Machine tries to network boot, succeeds in loading ipxe from the server. Ipxe tries to configure the network and shows progression dots (“…”) but fails to get an address, resets the nic port and tries again, also failing. On older machines the second attempt would usually succeed.
The triggers (as far as I can tell): In my network the very first packet sent by ipxe (when the “dhcp” command is issued to auto configure the network) is a dhcpdiscover packet. That packet gets assigned our “guest” vlan as our switch hasn’t yet learned which vlan the packet should be in. An answer is sent from the dhcp server in the guest network and seen by the client. Ipxe tried to dhcprequest the address but by now the switches have moved the packets into the correct vlan and the dhcp server in that network refuse to allow the ip address request and dhcpnak the request. Ipxe doesn’t process the dhcpnak request eventually timing out. For it’s second try it shuts off the nic (observed behavior, unexplained. I can see the link light on the port go out). This loss of link triggers our switches to throw away vlan info for the port leading to a second identical failure mode for the second loop.
There’s a timing (race condition) present as slower machines or slower network ports (some 100 some 1000) may work, I believe that’s because in some cases the vlan security info gets processed faster.
I found a proposed patch:
https://lists.ipxe.org/pipermail/ipxe-devel/2017-October/005873.html
which would add the ability to ipxe to process dhcpnak packets by starting over with a new cycle of dhcpdiscover, etc. Using this guide:
https://forums.fogproject.org/topic/12121/compiling-ipxe-boot-kernels
I patched and recompiled ipxe. This seems to have worked.I added the marked lines to: ./ipxe/src/net/udp/dhcp.c
— file dhcp.c changes —
/* (next line number was/is 557) /
/ Filter out unacceptable responses */
if ( peer->sin_port != htons ( BOOTPS_PORT ) )
return;-> /* ADDED 1-2021 per online suggested commit /
-> / Handle DHCPNAK /
-> if ( msgtype / BOOTP / && ( msgtype == DHCPNAK ) ) {
-> / Go back to discover */
-> dhcp_set_state ( dhcp, &dhcp_state_discover );
-> return;
-> }if ( msgtype /* BOOTP */ && ( msgtype != DHCPACK ) ) return; if ( server_id.s_addr != dhcp->server.s_addr ) return; if ( ip.s_addr != dhcp->offer.s_addr ) return;
— end changes —
I also thought about getting ipxe to send some kind of packet out 1-2 seconds before the dhcp discover process started to give the switches a second to recognize the device properly but couldn’t figure out an easy way to do that. There’s a ping command but as far as I can tell it doesn’t work before an IP is assigned to the interface, which the ifopen/dhcp command handles.
Newer vlan capable switches will apparently sometimes just drop the first packet but my particular location does not do this apparently.
I realize this is primarily an ipxe issue and I will comment appropriately in those forums as well but I wanted to document the issue here in case others are also seeing odd behavior in a vlan switch environment.
-
@matthew73 This is a unique condition. I can understand what is going on because we use NAC and VLAN switching on my campus. I can say that I have not seen this issue (anywhere) on my campus.
I think I understand what needs to happen. Basically iPXE needs to say something and then wait XX seconds for your NAC system to identify the hardware and to switch it to the right vlan. The network link light winking happens 2 times during a normal pxe booting. The first time is when the PXE turns over control of the network adapter to iPXE and then when iPXE turns over control of the network adapter to FOS Linux. We see a similar issue when the network switches are using standard spanning tree and not one of the fast protocols (RSTP,MSTP, port-fast).
The developers have created a specific group of iPXE boot loaders that have an embedded 10 second delay before iPXE tries to request an IP address. This gives STP and powersaver functions on the switch a chance to react before iPXE starts to talk. These files are in the 10secdelay folder. So to use these update dhcp option from
ipxe.efi
to10secdelay/ipxe.efi
This will call in the 10 second delay boot loader. See if that makes things better or not.