Failed to obtain lease on eth0 after ugrading to Fog 1.3.0
-
@asbenavides According to this thread:
https://forums.fogproject.org/topic/5886/what-was-default-kernel-in-fog-1-2-0/2
the default kernel for 1.2.0 was 3.15.6. Can you try that? If that doesn’t work, I’ll have to defer to either George, Sebastian, or Tom. -
@Wayne-Workman im going to reformat and reinstall 1.2.0 all over again maybe something got corrupted. Ill try your recommendations once its up and running
-
@asbenavides As a side note, if you are intending to re-install 1.3.0 RC, you don’t need to start with 1.2.0. You can just go straight to 1.3
-
This issue reminds me of a spanning tree issue where the target computer gets an IP address for iPXE but then in the FOS engine it can’t pick up an IP address. This is because spanning tree take 27 seconds to start forwarding data, since the FOS engine is so fast it has already given up by the time the network switch starts forwarding data. Wayne’s recommendation to use a mini (dumb) switch will typically improve your chances of getting an IP address since the building switch port never winks (momentarily turns off and back on) net network link as the FOS kernel takes over the management of the nic adapter. Most enterprise switches have a fast start mode for spanning tree called (portfast, fast-stp, rstp, and a few other names) that is typically turned on for access switch ports.
The other possibility here is that the network adapter is not supported by the FOS engine, in this case the nic will never initialize. I quickly looked through the thread but I didn’t see any mention of the target computer you are tying to boot. What device is this? Is it a physical (on-board) nic or a usb network adapter?
-
@george1421 its a Dell Optiplex 745 small form factor with integrated nic.
-
@asbenavides
If it might be a NIC issue, then I would try using a USB Nic, you might still be able to PXE boot from it and rule that part out. -
@Wayne-Workman I’ve already reinstalled and reformat computer and im getting the same issues…is there any special configs that need to be done to the switch. It worked alright with 1.2.0 but after upgrading to 1.3.0 its doing the same issue…is there any setting in the Fog that can be moved for it to keep the ip address request longer? It just takes a little bit longer
-
@asbenavides On a single port where you have the device connected. Can you ensure that spanning tree is turned off? I don’t usually recommend this because if forgotten can turn into a nightmare if someone creates a loopback with it. But we need to identify if this is a spanning tree issue.
That o745 has a very common nic so that should not be a problem. When FOG 1.3.0 is installed you also get the latest kernels and inits for the FOS engine (the software that captures and deploys images to the target computer). The FOS engine is failing to detect an IP address.
First see if you can disable spanning tree protocol on that specific switch port. Also ensure that any green ethernet functions (802.1az) are disabled on that switch port too. Then pxe boot the fos engine again.
If that doesn’t work manually register the host in the FOG console, then schedule a debug capture. Don’t worry we are not going to upload anything. What this will do is drop you to a command prompt on the target computer. From there we can run some debugging commands to try to understand why this system is going sideways.
-
@george1421 yeah all the ports here at the school district have the spanning tree enabled…that is going to be the issue its not giving the enough time to receive an ip address. Is there a way to tell fog to wait little bit more longer
-
@asbenavides Whats really at issue is that you MUST enable one of the fast spanning tree protocols for the access ports. BUT right now we need to determine if spanning tree is the issue.
FWIW When a port transitions from no link to link up, with spanning tree on (and not fast stp enabled) the switch will listen on the port for 27 seconds to listen for other switch announcements. After 27 seconds the switch will start forwarding data. If you watch the FOS engine boot, its checking for a network connection in about 4 seconds.
-
@george1421 disabled the spanning tree on that port and still no luck
-
@asbenavides well that’s interesting in that it blew my current train of thought. OK so I guess the next bit is as I suggested register a host and then schedule a debug deployment. Hopefully it will drop you to a command prompt on the target computer.
-
Is this a USB nic that’s failing in this way?
-
Hello
we had the same problem yesterday …
i just went to switch settings / Layer 2 / RSTP / settings / enable port fast status
no more problemsWe are using Alcatel switches
hope it helps
best regards , J
-
@Tom-Elliott its an Onboard NIC, I’ve already tired different models and I get the same error. But its confusing because with the 1.2.0 it was working fine…all I had to do was just pause it on the second nic configuration and it will work the only difference with the new 1.3.0 I cant pause on the third nic discovery…
-
@asbenavides said in Failed to obtain lease on eth0 after ugrading to Fog 1.3.0:
all I had to do was just pause it on the second nic configuration and it will work the only difference with the new 1.3.0 I cant pause on the third nic discovery…
How did you pause the third nic discovery in 1.2.0 ?
-
@Wayne-Workman by pressing the pause button on the keyboard but with the fog 1.3.0 it doesn’t work anymore on the 4th nic discovery
-
@asbenavides Just so I understand this correctly. During dhcp discovery you have been hitting the pause button to “freeze” the dhcp discovery for a bit, then when you release the pause it detects an IP address and the FOS engine continues to run as it should? And you have been doing this since 1.2.0?
If this IS THE CASE this condition still points to spanning tree not forwarding data right away. You MUST enable one of the fast stp protocols.
-
@george1421 I was thinking the init could be modified to look for a special kernel argument that tells hosts to wait. We are having portfast issues at work too, I’ve been thinking about this a lot.
-
@Wayne-Workman I know in the inits that the developers added the loop for the count of 4 for dhcp discovery, which I think the timing should have put it past the 27 seconds forwarding threshold. There should be two values. There should be a loop count and then a wait time you can play with. You “could” pass a kernel parameter to adjust either. It wouldn’t be that hard to extract the inits and update the code. This way if the kernel new parameter didn’t exist then the defaults would be used.
Just in case its not a spanning tree issue we have also seen issues with certain network adapters and green ethernet (802.1az). Typically a dumb (unmanaged) switch would mask this issue. But in the OPs case they are using a pretty old dell o745, and it doesn’t have any of the green ethernet stuff.