Configuring net0... slowly
-
We’ve just recently installed a FOG server. It seems to be working so far as we’ve tested, but after chainloading into iPXE it seems to hang at the “configuring net0(MAC address)” step. I feel like it shouldn’t be hanging for 4-5 minutes at this step before continuing on to the menu. I’ve tried different kernals, different hosts, different .pxe, .kpxe, and kkpxe. What am I missing? What is happening at this stage exactly? Driver for the NIC loading?
-
Lets start with the basics.
- What version of FOG are you running
- What target hardware are you trying to pxe boot?
- Are you using a built in network adapter or a usb ethernet adapter.
- is the device in uefi or bios (legacy) mode?
At this point FOG kernels don’t come into play. What is running is iPXE and as you referred to (.pxe, kpxe, etc).
Have you placed an unmanaged switch between your target computer and building switch, just as a test? I know it sounds off-point, but humor me (but this should only fix a 27 second delay not a 4-5 minute).
-
Thanks for the quick response.
FOG version: 1.4.4
Target host: Dell Optiplex 990 (Tried a 755 and an HP Elitebook as well with similar results.)
Built in network adapter on the host: Intel 82579LM Gigabit Network Connection
Using legacy mode.I believe we used an unmanaged switch at one point. We struggled with DCHP for a bit, but then we’ve moved it to a managed switch on our network. We’re using our network DHCP now. After disabling it in FOG, of course. I could still give it a shot.
-
@fradlo OK so you have old tech here. (not a slam, just allows us to rule out bleeding edge stuff).
Can you define what “network dhcp” is now? But iPXE is loading then iPXE has an issue configuring the NIC. I’m still in inclined to think it is a spanning tree issue, hence my suggestion to put an unmanaged switch between the building switch and the target computer. That unmanaged switch typically don’t support spanning tree, plus it keeps the building switch port from “winking” as the PXE rom hands off network adapter control to iPXE. Understand this is only a quick test to see if it could be a spanning tree issue, where your building switches have spanning tree enabled (a good thing) but are not using one of the fast spanning tree protocols (fast-STP, RSTP, MSTP, etc).
-
We’ve got a Windows DHCP server setup. FOG is not handling DHCP. Can’t really go into much further detail as I’ve not really been around this place that long.
I don’t mind trying it, however, will I need to enable DHCP on the FOG server when I do so? We had difficulties getting that to work so we just let our network’s DHCP handle it with the network admin’s assistance.
-
@fradlo No that is fine, windows dhcp server works perfectly. There is no need for dhcp running on FOG. Since we are coming into this problem cold, we want to ensure your environment is doing something strange. There is a setup you can do in windows 2012 and newer dhcp server to make pxe booting both uefi and bios systems work smoother, but you are not to that point yet.
I would still recommend putting the dumbest switch (unmanaged) between the target computer and the building switch then pxe boot the target computer. Lets see where we stand after that.
-
Alright, was able to get the time to do this. I hooked an unmanaged switch up to our lab switch. I hooked the FOG server and the target host up to the unmanaged switch.
Attempt 1: The same thing happens. It passes through the Intel Boot Agent PXE, grabs DHCP quickly, and then chainloads into iPXE 1.0.0 just fine. Then it hangs at configuring net0(MAC) for awhile before successfully continuing forward.
Attempt 2: I disconnected the unmanaged switch from the lab switch. Host was unable to reach DHCP and had no way of knowing how to reach the FOG server.
Attempt 3: I reconnected the unmanaged switch to the lab switch. Then I waited for the first DHCP exchange. When iPXE appeared I disconnected the unmanaged switch from the lab switch. It actually went fast at the configuring net0 step, but only to tell me that it was unable to reach DHCP.
-
We did run Wireshark. Once the process hits the configuring (net0 MAC) step there are no packets being sent or received. Once the process ends there is a malformed DHCP packet.
-
@fradlo said in Configuring net0... slowly:
Attempt 1: The same thing happens. It passes through the Intel Boot Agent PXE, grabs DHCP quickly, and then chainloads into iPXE 1.0.0 just fine. Then it hangs at configuring net0(MAC) for awhile before successfully continuing forward.
This issue has me a bit baffled. The description above precisely reacts like I would expect if spanning tree was enabled and only standard mode is used. The unmanaged switch should have masked this fact, unless the unmanaged switch supported spanning tree for some reason.
Let me describe what I’m thinking is going on. When the computer boots up the PXE roms have control of the network adapter which then loads iPXE. When the iPXE kernel takes over the network adapter it will momentarily drops (a.k.a winks) the network link as iPXE initializes the network adapter. When the switch sees the network drop, spanning tree switches the port from forwarding to learning, listening for a BPU packet. I will listen for 27 seconds then if no BPU is received it will move the port back to forwarding mode. Then when you transfer control to FOS by either scheduling an imaging task or registration, when iPXE hands off to FOS, FOS will again wink the network link as it takes control over the network adapter, restarting the STP counter. That description matches pretty close to what you are seeing with your 3 tests. When you remove the network cable the kernels see the link state down and just skip trying to get a dhcp address. That also matches up with what you are seeing with wireshark, there is no activity while iPXE is configuring the network adapter.
I guess we need to see if you can confirm with your network admin that you are using one of the fast spanning tree protocols on your building switch??
Random other thought: Do you by chance have some kind of nac/nap system that might delay a port coming up?
-
Just in case the unmanaged switch had STP on it somehow, I had the network admin remove STP (set them as edge ports) from the interfaces being used by the FOG server and the host on the lab switch. Same issue occurs. I doubt there is any kind of NAC/NAP going on.
It’s longer than 27 seconds for it to move forward from the configuring step. It’s at least a minute to a minute and half. According to the Wireshark capture it’s 65 seconds. It’s not world ending, but I think there’s something keeping it from going faster. I just can’t figure out what.
I see some text fly by when it gets past the configuration step. Is there anyway to capture that text?
-
@fradlo ok well that rules out stp then.
I guess the next round of testing should be this
-
Lets get a pcap (packet capture) of the pxe booting process. Since the fog server and the target computer are still connected to the same switch I want you to follow this process (I know you are wireshark capable, but lets follow this route): https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue Lets add to the filter sting port 80 too, so we can capture http requests to the FOG server. Run the pcap all the way through registration. Upload the pcap to a google drive or drop box. You can either post the link here or DM me the link so I can take a look at it.
-
We need to rule out anything in your environment that could be causing this issue. It may be worth setting up isc-dhcp server on your fog server for a test. Keep that unmanaged switch between your FOG server and your target computer and unplug the unmanaged switch from your business switch. and then with isc-dhcp server running see if pxe booting behaves any differently. I have 790s in our office and they pxe boot just fine. So unless the bios is the original release on the 990s pxe booting should just work.
-