ipxe dhcp timeout



  • I’m having a similar behavior as to what was discussed here:
    https://forums.fogproject.org/topic/4973/default-ipxe-connection-timeout-on-dell-only/7

    We have portfast enabled on our switch ports but it seems like ipxe just isn’t getting an IP address quick enough. We have a NAC appliance that delays the assignment of a vlan for a second or two and that seems to be the culprit, when I remove the dot1x configuration from the switch port the machine boots as expected. With the NAC(dot1x) configuration in place, if I press ‘s’ and then enter dhcp followed by the chain command at the ipxe shell I am able to see the boot menu and the machine boots as expected.

    The machine in question is a Dell OptiPlex 3010. Is there any way to extend the dhcp timeout?


  • Moderator

    @Wayne-Workman AND/OR post the image a google drive so the OP has total control of the image after the need is gone. Then just IM the link to the specific person(s) for review.


  • Moderator

    @networkguy All of our Moderators, Developers, and Senior Developer are trustworthy people. If you don’t want to upload the capture to the forums, I might suggest messaging them to get an email address you can send it to. If they are able and willing, they will look through it and then the conversation can continue here in this thread without anything sensitive being shared.



  • @Sebastian-Roth
    I picked up on the difference after posting the picture and changed my description slightly. Thank you for pointing that out.

    I appreciate you spending some time looking into this. One change I made which seems to work at least with this one computer, is changing the authentication order for the switch port. We aren’t really doing dot1x at the moment so it really doesn’t make sense to have the order as it was:

    Previous port config(i switched both to mab dot1x):
    authentication order dot1x mab
    authentication priority dot1x mab

    All is well at the moment, I will be changing the rest of the port configs and then follow up with changing our DHCP scopes again to see if any other problematic devices are reported.


  • Developer

    @networkguy I know why I keep asking people for posting a picture of what they see. Don’t want to sound arrogant but we usually see more than most users (especially as there are more eyes in the forums!)… The picture you posted is showing a different error than you initially posted. Timeout on default.ipxe is totally different than timeout on the preceding DHCP request.

    @george1421 said:

    Is there a way in the iPXE kernel script to either try X times then die or set a startup delay to give the NAC system a chance to reregister the device between each network wink?

    This reminds me of the fact that the iPXE developers added some kind of spanning tree detection (and wait) probably about two years ago. So I am wondering if this should be addressed within the iPXE source as well. A quick search for “ipxe 802.1x” on the web revealed this post. While I haven’t tested it to me this sounds like iPXE in fact should cope with basic EAPOL stuff. I will check the code when I have a bit more time.

    On page 5 of this presentation it says: “PXE Boot -> Open access”. From this document it seems to me that you need to configure your PXE booting ports as “Open access”. Sorry if you’ve already done this and it’s still not working. While I have done a fair amount of networking stuff I didn’t have a chance to look into that 802.1x stuff much yet. So this is just me flying “on sight” (means reading the manuals).

    I’m also slightly hesitant to upload a pcap from our domain controller.

    Perfectly fine. I do understand this. Less information simply means less professional help. Your choice.


  • Moderator

    @networkguy Yeah that’s not going to work (the standard way to get this info). If your fog server was on the target computer side you would capture the client broadcast messages, but not the dhcp server. Once the dhcp requests hits the dhcp-helper it turns the broadcast messages to unicast messages.



  • @george1421
    fog server and dhcp server are on the same subnet, the client is on another. We have the dhcp server added on our router using ip helper-address.


  • Moderator

    @Sebastian-Roth Is there a way in the iPXE kernel script to either try X times then die or set a startup delay to give the NAC system a chance to reregister the device between each network wink? I know his troubles because I’ve worked at a company that used NAC. It was a bit of a pita for network booting.


  • Moderator

    @networkguy If your fog server, target computer and dhcp server are in the same broadcast domain (subnet) then its ok since the dhcp traffic we care about is sent via broadcast messages.



  • @Sebastian-Roth
    I have a very blurry screenshot that shows what I am seeing. I apologize for the quality. I also removed MAC/IP information from it.

    I do not see the ok after configuring (net0 …)

    Pressing ‘s’ to get into the shell followed by dhcp and then chain http://myfogserver/fog/service/ipxe/boot.php does allow me to boot.

    Regarding running tcpdump on the FOG server, is that with the assumption that it is our DHCP server? If so then in my case I won’t be able to take that approach as our DHCP runs on our domain controller. As much as I really appreciate this assistance, I’m also slightly hesitant to upload a pcap from our domain controller.

    http://pasteboard.co/191dt1Ib.png


  • Developer

    @networkguy Do you see the Configuring (net0 aa:bb:cc:dd:ee:ff) ... ok (especially ok) before the timeout?

    Can you please install tcpdump package on your FOG server and run sudo tcpdump -w timeout.pcap udp, then boot one of the clients till you see the timeout and stop tcpdump (ctrl+c). Upload the timeout.pcap file to the forum.



  • @george1421
    Great suggestion George, thank you. I will do that in the morning. We will also be attempting to improve upon the way the switches are connected. The computers in question are 6 switches down a stack which are daisy chained together…


  • Moderator

    @networkguy Just a comment, if you use dhcp reservations you can define on a per client basis dhcp options. So while you are testing with this single client you can point to the new fog server and boot file. You can do this without breaking your current deployment environment.



  • @george1421
    Sorry George. It would be helpful to know that we have 2 fog servers up and running. One at .29 that has been in use for many years and 1 at git version 7659. I updated the DHCP settings in our labs scope this morning to use the new server/undionly.kpxe and this issue cropped up. Due to not being able to troubleshoot further today I put the settings back to our old server’s IP/pxelinux.0.


  • Moderator

    @networkguy pelinux.0, Oh wait I guess I missed asking you what version of FOG are you running. pelinux.0 is not used with 1.1.0 or newer. If you have one of those and you are using that you have other issues than dhcp.



  • @george1421
    I’m not going to be able to get back to test this today. For now I updated the DHCP settings to use pxelinux.0. I will test with an unmanaged switch tomorrow and report my findings.

    The device is currently plugged into a Cisco Catalyst.
    cisco WS-C3560-48PS

    Thanks!


  • Moderator

    said in ipxe dhcp timeout:

    Dell OptiPlex 3010

    Just for reference the OptiPlex uses a Realtek nic. Is it safe to assume your building switch is an advanced switch like a catalyst? (this seems to be a common thread of late)


  • Moderator

    @networkguy said in ipxe dhcp timeout:

    Our nac is using mab (mac-auth-bypass) so no supplicant is needed to pxe boot.

    Fair enough.

    If you place a dumb (unmanaged) switch between your building switch and the booting device does it mask the issue?


  • Moderator

    @networkguy The other thing that happens during a FOS (Fog client OS) is that the network interface “winks” several times as the PXE ROM transitions to iPXE and then from iPXE to the FOS kernel this plays havoc with NAC systems.



  • I’m running git version: 7659

    Our nac is using mab (mac-auth-bypass) so no supplicant is needed to pxe boot.


Log in to reply
 

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.