dhcp issue - Lenovo E73 - Realtek RTL8111GN
-
FOG behaving differently on 2 different types desktops. it let me deploy image straight from fog menu on Lenovo M91p but it didn’t work on Lenovo E73. it gets IP from fog dhcp then after fog menu and selecting image to deploy it tries to get IP again and this time it fails. error “Failed to get an IP via DHCP!”
Both known working desktops.
Please help.
-
Well there’s two reasons that come to mind.
- You have spanning tree enabled on the network port but you are not using one of the fast spanning tree protocols like RSTP. The test for this is to place a dumb (unmanaged) switch between the pxe booting computer and the building network switch. If it boots and images correctly with the unmanaged switch in between then its a spanning tree network issue.
- Since its getting into the fog iPXE menu but failing when FOS starts its possible that FOS doesn’t have the network driver built in. I kind of doubt it, but its possible. FOS is the Fog Operating System (customized version of linux) that runs on the target computer to capture and deploy images.
-
i have isolated network setup. using dumb switch and only fog server and target computer is connected.
it is strange that i works in one desktop but not in other. -
@pdit What version of fog are you using?
-
Latest one.
-
Here is screen shot of error i am getting. It is only happening to Lenovo E73 model desktops.
-
@pdit Maybe our link detection is failing on this particular NIC. But I kind of doubt this as we simply use standard Linux tools and information from the kernel to figure this out. Really sounds like a spanning tree issue. Different NICs might behave in different ways here.
-
so what can we do ? we will need this fog server to do imaging.currently most of our desktops are E73 model.
-
@pdit what I want you to do is manually register this E73 host with the fog server. Then schedule a capture/deploy (don’t care) but before you hit the schedule task button, tick the debug checkbox then schedule the task.
PXE boot the target computer. After a few enter key presses you will be dropped to a linux command prompt.
-
Key in
ip addr show
. Look to see if the target computer has an IP address. If not record the name of the ethernet adapter (like eth0 or eno33, or what ever) go to the next step. -
Key in
/sbin/udhcpc -i $iface --now
were $iface is the name of the ethernet adapter found in step one. Does this now pickup an IP address. If not then go to step 3 -
Key in
lspci -nn |grep -i net
and post the results here.
-
-
George, I ran debug task as you requested. here is picture showing the result. It did pickup IP after step 2 so didn’t bother doing 3rd step. let me know what this means and what can i do to fix this.
-
@pdit Please do me a favor and boot up your client for another debug session. This time run
cat /sys/class/net/eth0/carrier
, take a picture and post here as well. This is the link detection we are using in the scripts. -
Just ran the command and it returns with nothing. here is picture.
-
@pdit Ahhhh, too bad, we can’t see the output as your screen does not seem to be properly calibrated with the resolution used by the Linux kernel. Can you try re-calibrating the display before running the command so we get all the information on the left end of the output. Otherwise we need to do some bash magic. Running the following command lines should also work if you can’t fix the display stuff:
linkstate=$(cat /sys/class/net/eth0/carrier) [[ $linkstate -eq 1 ]] && echo "Yeah, we got link here" || echo "Tooooo bad, Linux kernel is not able to detect the link"
-
Sorry about the screen issue. I just calibrated it and ran command again, This time it returned with “1”
-
@pdit Ok, now we know that our link state check in the startup scripts is indeed working as well for your network card. So question remains, why does it take more than 35 seconds after boot up to change its link state to UP?
Sure we can adjust the wait time for you but I suspect this time to be caused by either spanning tree or energy efficency settings on your switch.
You said that you have an isolated setup with a dumb switch. Which switch model exactly are you using??
-
I use following 2 switches. I get same problem in both. but when same switches are used with in network fog server there is no problem. it only happens with one particular desktop and in isolated setup only.
https://www.linksys.com/us/p/P-LGS108P/
https://www.amazon.com/KEEBOX-SGE05-1000Mbps-Gigabit-Ethernet/dp/B004FM58MO/ref=sr_1_fkmrnull_2?crid=3C2KC43XR3UPQ&keywords=keebox+switch&qid=1556297026&s=gateway&sprefix=keebox+%2Caps%2C192&sr=8-2-fkmrnull -
-
@pdit Try those inits: https://fogproject.org/inits/delay/init.xz and https://fogproject.org/inits/delay/init_32.xz (both have the timeout increased from 35 to 120 seconds - let’s see if that works for you)
-
Please provide instructions how to do this.
-
mv /var/www/fog/service/ipxe/init{,_orig_26-APR-19}.xz mv /var/www/fog/service/ipxe/init_32{,_orig_26-APR-19}.xz wget -O /var/www/fog/service/ipxe/init.xz --no-check-certificate https://fogproject.org/inits/delay/init.xz wget -O /var/www/fog/service/ipxe/init_32.xz --no-check-certificate https://fogproject.org/inits/delay/init_32.xz