Dell Precision 3930 Rack - EFI PXE Issue


  • Hi guys,

    I am deploying Windows 10 Pro x64 2004 to a bunch of Dell 3930 Rack units, very simple image which is working fine if deployed via PXE with Legacy boot using undionly.pxe.

    I am attempting to use the 10Gb Aquantia AQtion 10Gbit controller that is onboard these servers for faster capture and deployment. As mentioned above it is working fine with PXE boot when the BIOS is set to Legacy mode. However if I change the BIOS back to UEFI, enabled UEFI Network Stack and disable Secure Boot I cannot boot to the FOG bootloader at all.

    What I have tried:

    • Switching bootfile and Option 67 on DHCP to use the following ipxe.efi, snponly.efi, snp.efi to no avail.

    • Switched to using the onboard Intel I210 1Gb NIC and with ipxe.efi it does get to the bootloader which tells me it is the Aquantia NIC having the issues.

    I select the UEFI NIC PXE boot option on boot, I get the downloaded NBP message, then it gets to the iPXE screen, fetches the MAC address, I know the NIC has picked up an address over DHCP as I can see it on the server.

    The message I get is “No Configuration Methods Succeeded” then it times out and reboots. For now I am making do with changing the BIOS settings to Legacy to image then back to UEFI when imaging completes but its a pain in the proverbial.

    Do I need a different boot file? How do I build one that works with this Aquantia NIC chipset?

    Fog Version: 1.5.7
    Fog Server OS: Ubuntu 18.04.4 LTS
    Clients: Dell Precision 3930 Rack with Aquantia AQtion 10Gb NIC
    Switch & DHCP: Netgear M4300-12x12F

    Any idea would be greatly appreciated!

    Thanks,
    Jordan

  • Moderator

    @redbull007 Understood not fog. But where in the process is it failing.

    PXE Rom requests dhcp address
    PXE Rom DORA sequence where it gets the boot server and boot file
    PXE ROM downloads the boot file from the boot server
    iPXE starts up
    iPXE Requests dhcp address.

    If its in the iPXE then I thought of a test we can do

    The tcpdump from the fog server point of view will tell us what the target computer is exactly asking for. That is the logic why to use it there. The test gives the best details if the target computer and fog server are on the same subnet to grab the packet capture.


  • @george1421
    It isn’t the FOG , i just want to understand because your problem describe seems the same i.

  • Moderator

    @redbull007 While I’m probably stating the obvious, but that packet capture is from an SMS net boot and not FOG.

    BUT, I have seen tftp transfer abort over a WAN connection if the MTU of the link is smaller than the block size of the tftp packet, because the tftp packet says to not fragment.

    With your 10GB link, is the target computer on the same subnet as the FOG server and dhcp server? If so I’d like to see a complete packet capture from the FOG server’s point of view. I have a tutorial here on exactly what I’m looking for. https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue

    Upload the pcap to a file share site and either IM me the link or post it here. Once I have the pcap you can take down the pcap. I want to see what its specifically doing here.


  • @george1421

    It’s very long to describe all test, but to resume , there is 2 network card one 1GB and 10GB.
    There is no issue with 1GB all packet (wireshark) pass.
    When you unplug 1GB to 10GB and capture with wireshark , I have this message in log capture > “user aborted the transfert” here “user” is the card 10GB

    2021-01-22 12_04_06-BNPP CIB - Point SCCM _ Microsoft Teams.png 2021-01-22 12_04_00-BNPP CIB - Point SCCM _ Microsoft Teams.png

  • Moderator

    @redbull007 I’m still on the fence to/if your issue needs to be broken out. If necessary I can fork this thread, but for now lets continue in this thread.

    Except for this hardware, the only time we’ve seen this is when the building switch has standard spanning tree enabled and not one of the fast spanning tree protocols (RSTP, MSTP, port-fast, etc). A quick check to see if it is a spanning tree issue is to place a dumb (read cheap) unmanaged switch between the pxe booting computer and the network switch. That cheap switch will typically not support spanning tree so it will keep the main network switch’s port from winking while iPXE starts up. So try that route first to see if it masks the problem.

    Having a clear screen shot of the error taken with a mobile phone would also help set the context of the error.

    You are saying all tftp packets are being rejected, then in that case iPXE is not getting to the target computer. I don’t think that is the case, but a picture would explain the source of the error message.


  • @george1421

    Yes the same hardware except the switch is different.
    All packet TFTP in PXE are rejected by the NIC card.

  • Moderator

    @redbull007 said in Dell Precision 3930 Rack - EFI PXE Issue:

    I have the same issue , could you please tell me if you found a solution ?

    You have the same error with the same exact hardware as the OP in this tread? If not lets start a new thread so we can keep problems with different issues isolated.


  • @jordynorm
    Hi ,
    I have the same issue , could you please tell me if you found a solution ?

    Thks for you answer

  • Senior Developer

    @jordynorm I am still keen to see the füll DHCP DORA in wireshark to be sure! Best if you can capture directly in the DHCP server (filter for the machine’s MAC address) or using a mirror port on the switch directly connected to the PXE booting machine.

    There is no need to alter the default.ipxe file.

  • Moderator

    @jordynorm said in Dell Precision 3930 Rack - EFI PXE Issue:

    default.ipxe

    No worries on this one because its just a text file that redirects ipxe boot loader to be able to find boot.php. You can look at that file with a text editor or cat

    Ok that hardware id translates to [ 1D6A:07B1 ] in linux. And all linux kernels 4.16.x and later have this driver available. I need to see if FOG’s FOS Linux has this driver enabled.

    I realize that you need to turn these servers over today, but we can get you a working solution to image these with FOG in about 1hr if you need it. I just need to see if the default FOS kernel has this driver enabled.

    [edit] Yes the default FOG kernel for FOG 1.5.9 has this network adapter enabled as well as the 5.6.18 version of the kernel. So we can built a usb boot for FOS easily.


  • @george1421

    I unfortunately have to relinquish these servers today so cannot continue testing but I did try the latest ipxe binaries, default.ipxe appeared to be missing when I grabbed the latest repo from git so I used my previous one. In any case the Aquantia controller behaved the same with every variant of the latest tftp boot files.

    Here’s the hardware ID of the controller:
    PCI\VEN_1D6A&DEV_07B1&SUBSYS_08731028&REV_02
    PCI\VEN_1D6A&DEV_07B1&SUBSYS_08731028
    PCI\VEN_1D6A&DEV_07B1&CC_020000
    PCI\VEN_1D6A&DEV_07B1&CC_0200

    Driver: aqnic650.sys v2.1.12.0

  • Moderator

    @jordynorm You can use the FOG server for packet capture if the fog server and pxe booting computers are on the same subnet using this tutorial: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue

    If the pxe booting computer is on a different subnet then you can use wireshark running on a witness (second) computer with a capture filter of port 67 or port 68 You will get the best quality of the capture if you can do it from the fog server, but the wireshark route will also tell us what the dhcp server is saying.

    Lastly if you can’t get iPXE to work we have a way to usb boot into FOS Linux, you give up some features of iPXE but you can basic image with the USB method. In regards to FOS Linux, can you get the hardware ID of this network adapter from the device manager in windows? I need the vendor and device ID to check against the linux kernel list of drivers.


  • Hi Sebastian,

    Thanks for your fast response!

    I will try the latest iPXE binaries first as that may resolve it easier than trying to packet capture from the Netgear switch (this is running the DHCP service).

    I’m not 100% sure it can be a DHCP issue though as the same Aquantia NIC picks up all DHCP DORA when in Legacy mode and even when in EFI mode I can see on the switched DHCP lease table it has acquired an IP…

  • Senior Developer

    @jordynorm said:

    I select the UEFI NIC PXE boot option on boot, I get the downloaded NBP message, then it gets to the iPXE screen, fetches the MAC address, I know the NIC has picked up an address over DHCP as I can see it on the server.

    The message I get is “No Configuration Methods Succeeded” then it times out and reboots.

    At first it sounded as if iPXE wouldn’t communicate over the Aquantia 10Gb NIC at all but these sentences make me think you get to a point where it sends out packets at least. Might be that it’s not able to receive the DHCP answer.

    Probably best if you can do a tcpdump/wireshark capture on the DHCP server to see how much of the full DHCP DORA (discover, offer, request ack) is happening.

    As you are still on FOG 1.5.7 you can manually download the latest iPXE binaries and see if it works using those.

289
Online

8.2k
Users

15.1k
Topics

141.9k
Posts