DHCP Lease Failing on 1.5.5 after upgrade - again
-
Hi,
I don’t found how make a subject to unsolved again., So I make a new topic.
To summarize I make a upgrade from a 1.4.4 to 1.5.5 and most of time I have a DHCP Lease Failing problem (see screen shot following)
The problem is back again, we make some several test, with only client and fog serveur in the same dumb switch to be sur there is no other DHCP server. We change the switch and cable too, and some time, I don’t know why it’s working.
Some time 3 - 10 times, and after don’t working anymore on HP Prodesk 400 G4 - HP Prodesk 400 G3, old asus tower; acer Travelmate P2, but no problem with Dell actually.I think about a problem during the upgrade, but I don’t know where to found it yet, I will make a fresh install with a 1.5.5 to check.
-
Hi again,
Same problem with a fresh install on 1.5.5
Regards
-
@totoro OK what I want you to do here is this:
- Manually register this target computer with FOG
- Schedule a capture/deploy (don’t care) but before you hit the schedule button, check the debug check box, then schedule the task .
- PXE boot the target computer.
- After a few enter key presses, you should be dropped to a linux command prompt on the target computer.
- At the linux command prompt key in:
ip addr show
I’m interested in what it lists for eth0. Just post a screen shot of that output. - I’m also interested in the output from this command:
lspci -nn|grep etwork
Based on the output of those two commands we’ll decide the next steps.
-
-
@george1421 said in DHCP Lease Failing on 1.5.5 after upgrade - again:
lspci -nn|grep etwork
Please run
lspci -nn|grep -i net
…To me it looks like the ethernet card is properly loaded and UP. Looks like the driver is not properly sending or receiving packets. We need to know the exact model of the card. Please run the above command and we’ll see.
-
@Sebastian-Roth FWIW it looks like we need some kind of regex expression to catch both cases.
$ lspci -nn|grep net 00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network Connection [8086:1502] (rev 04) $ lspci -nn|grep etwork 00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network Connection [8086:1502] (rev 04) 02:00.0 Network controller [0280]: Intel Corporation Centrino Advanced-N 6205 [Taylor Peak] [8086:0082] (rev 34)
-
@george1421 As posted, using case insensitive grep should work fine:
lspci -nn|grep -i net
-
-
@totoro Ok, just to confirm. This NIC is supported in the Linux kernel since a very long time. As well we definitely have had the driver in our kernels builds since August 2016.
Can you please grab a mini switch and connect that between your notebook and the main network and boot again? Does that make a difference?
While this is not causing the issue I still wonder why you have “osid=0”??? That shouldn’t be. Please check your image defintion of “TESTACER” as well!
-
@Sebastian-Roth ; No is note make a difference, it’s really strange, like I say we direclty connect client to fog server with a switch with the same result. And some time it’s working and some time not, without logique.
For the osid=0, I think it’s because we registrate de client, without making image and start it in debug mod by a fog task.
-
@totoro Sorry I jumped off my focus and back to my questions.
- At the linux command prompt key in: ip addr show I’m interested in what it lists for eth0.
Your picture shows that eth0 is found. Can we guess that the mac address in your picture matches the actual physical network adapter in this computer? If so then the kernel network driver is fine. Sebastian’s suggestion to use a dumb (unmanaged) switch should have fixed the problem we are thinking of. Will you reboot the computer back into the FOS debug console? I want to try a command from the FOS linux command prompt.
/sbin/udhcpc -i eth0 --now
Then wait until the command completes then again run an
ip addr show
to see if it picks up an IP address. If a network address is assigned then I want you to run this commandping 192.168.0.10
(which should be your fog server’s IP address). Make sure you get a response. -
-
@totoro Well that is disappointing… but now we know its not “time” that solves your issue (you mentioned randomly it works).
What I want you to do next. Take a second computer and load wireshark on it. Plug it into the same subnet/switch as the one in the picture. Use the wireshark capture filter of
port 67 and port 68
. Start the wireshark capture then issue that same udhcpc command. What I’m hoping to see in the capture is a DISCOVER, OFFER, REQUEST, ACK sequence from dhcp. Since dhcp is actually failing on this computer I’m guessing its failing on one of the steps. If we see the DISCOVER dhcp packet then we know the target computer is alive and on the network. Post the captured pcap here and I will take a look at it in detail. -
@totoro As well you could try booting in debug mode and setting a static IP and see if you can ping then:
ip addr add 192.168.0.222/24 dev eth0 ping 192.168.0.10
-
During the “udhcp: sending discover” process we don’t see anything:
We make some try with older Kenel, and it’s look we don’t have any problem with the 4.18.3 one, we going to continue to make test and see if it’s solved. They have a way to setup Fog to use a specific kernel by default without erase the last one ?
It’s could help for the last oneThanks for you help
-
I had the same problem with an older version of fog. I solved it by decreasing the network speed; I went down to 100 Mb and the network card can negotiate the speed.
-
@totoro said in DHCP Lease Failing on 1.5.5 after upgrade - again:
We make some try with older Kenel, and it’s look we don’t have any problem with the 4.18.3 one,
Ah, that’s great to hear. I did not expect an older kernel would help here. Sure you can use the newer kernel as default and add the older one just for some particular machines. I will give instructions when I have a bit more time later on today.
-
@totoro I’d advice you to use the newest kernel as default. Then for those devices where you have issues with the network you can manually download an older kernel. Run the following commands as root:
cd /var/www/html/fog/service/ipxe/ wget https://fogproject.org/kernels/Kernel.TomElliott.4.18.3.64 wget https://fogproject.org/kernels/Kernel.TomElliott.4.18.3.32 chmod 644 Kernel.TomElliott.*
Now go to the host settings in the web UI and set Host Kernel to
Kernel.TomElliott.4.18.3.64
(guess those are 64 bit architecture)If you get “Kernel is too old” errors then you also need to download other init files:
cd /var/www/html/fog/service/ipxe/ wget https://fogproject.org/inits/init_compat.xz wget https://fogproject.org/inits/init_32_compat.xz chmod 644 init_*compat.xz
Now set Host Init in the host settings of that machine to
init_compat.xz
. -
@Sebastian-Roth Thank’s for your answer. I think we going to stay in 4.18.3 because if each time we have to register the client to work on it, we going to lose lots of time…
Can we make a bug report with this problem ? To the linux kernel team ? Or the Fog Kernel team ?
Thank’s again for your help.
-
@totoro said in DHCP Lease Failing on 1.5.5 after upgrade - again:
Can we make a bug report with this problem ? To the linux kernel team ? Or the Fog Kernel team ?
You surely can and you partly have already by posting this here in the forums. But what it needs to actually get this solved in the Linux kernel is definitely some work down the road. I am more than happy to guide this process but need you help in trying out different kernels over and over until we find out exactly where the issue was introduced. This can take some time. Will you have access to at least one of these machines as well as time for testing over the next weeks!? There is no point in starting this if you can’t do the testing reliably - I don’t have the hardware and can’t do it therefore!