ipxe booting "waiting for the link to come up"
-
Hi,
Recently installed fog server on my domain. I use my own dhcp (pfsense) and not fog’s dhcp. I was able to register and image a big number of different computers but today i was trying to register a tiny Toshiba Satellite pro.
After pxe booting, fog’s menu shows up. Then i choose quick register and after that it shows “starting enp3s0 interface and waiting link to come up” after that its says “No link detectet on enp3s0m, skipping it. Failed to get an ip via DHCP!”
Thats weird because the laptop was able to get an IP and boot to fog’s server menu.Any suggestion? im stuck there.
thanks in advance.
-
@kvothe88 So we know other devices work well from what you say. So this seems like a specific issue with the Toshiba Satellite pro device. Let me ask you a few questions and then I’ll explain some of the background.
Which version of FOG and FOS kernel do you use? See in FOG web UI -> FOG Configuration -> Kernel Update
What you need to know about the boot process is that different components are working together to make this work:
- NIC ROM doing the initial PXE boot to pull an IP and get nextserver and filename information
- with that info it downloads an iPXE binary from your FOG server, a “PXE bootloader” used to show the FOG boot menu or boot into a scheduled task iPXE needs to query the DHCP server a second time
- when a task is scheduled iPXE will load and handover to FOS (FOG OS, a tiny Linux with kernel and initrd doing all the work) which will boot up and again needs to get an IP from your DHCP server…
Unfortunately those components are not able to hand over the IP information to the next part. And they all use their very own set of device drivers.
So in your case the first two are going fine but the last one cannot get an IP.
Please pay attention to the link-up LED on your Toshiba and the switch it is connected to. Is the link up at this stage where it says “waiting link to come up”?
-
@sebastian-roth said in ipxe booting "waiting for the link to come up":
ersion of FOG and FOS kernel do you u
hi,
thank you for your answer. Sorry for the delay, had no time to answer. Now i have a while, i can tell you that my kernel version is 5.10.12 from 1st of February for bzimage and bzimage32. Also, the switch makes link when trying to get the ip adress.
Also, i registered the host by myself with the lan mac adress and on the first step menu, before deploying image or selecting quick registration, seems to recognize the host with the name i gave to it on the fog server.
After this, when i select an option, it trys to get a new IP and cant handle it…thats frustrating…all the kernel versions are the same with all the drivers + new ones in the newst versions?
any ideas? I tried also to put a normal switch between my computer and the normal switch to see if that was a spaning tree problem but NO…
thanks in advance!
-
@kvothe88 As you have registered the host, please schedule a debug task for it (same as a normal task but just before you click the button in the web UI there is a checkbox for debug) and boot it up. You will see the same issue with the network link but it should go ahead after a while and bring you to a command console. Maybe you need to hit ENTER a couple of times to get there.
Now type the following commands, take a picture and upload that to the forum:
dmesg | grep firmware ip a s lspci -nn | grep -i net
The first command might come back blank but the other two should return something for sure! Keep the debug session open as we might have more commands needed to get the information needed.
-
@sebastian-roth said in ipxe booting "waiting for the link to come up":
nd might come ba
hi, thanks for the fast reply. Here you have the image. For the first command i had no results:
-
@kvothe88 Realtek nic 10ec:8136 first added to the linux kernel in version 3.3 so the kernel fog uses (even an old one) should support this nic. I’m going to think that FOS Linux is missing a specific firmware patch for that nic.
Run this command from the debug console
dmesg | less
Look through that log file for specifically things that say “failed to find fw file”. It should list a file path and file name. Send us a screen shot of the missing firmware file. -
Hi Gorge, could not find nothing similar to this in that log. Its very big and i could miss something but I checked it twice and saw nothing similar…
Could those drivers been missing? thanks
-
@kvothe88 The driver should be in the FOS Linux kernel because it is a very common one. Sometimes the nic card requires a specific driver.
This is a bit harder way to look if you don’t know vi commands but key in
vi /var/log/syslog
Then key in
/Failed
and press Enter (case is important). That will take you to the first occurrence of the word “Failed” in the log. If that is not it press/
and enter again to go to the next “Failed” word. If you searched through the whole log and did not find an mention of missing fw (firmware) then that isn’t the problem. To exit vi key inESC q!
This one will give us a bit more detail on the lspci command key in the following:
lspci -nn -k | more
Scroll down to where you see the line starting with 03:00 There should be two additional lines below that we are interested in. Does it list a kernel module being used?Here is an example of what I’m looking for. This is from my FOG server of course, the kernel values is what is needed.
0b:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b 0] (rev 01) Subsystem: VMware VMXNET3 Ethernet Controller [15ad:07b0] Kernel driver in use: vmxnet3 Kernel modules: vmxnet3
-
@george1421 said in ipxe booting "waiting for the link to come up":
vi /var/log/syslog
vi /var/log/syslog yu mean in the Toshiba laptop or in my fog server? On toshiba, on the same consolore where i ran the debug task, there is not a syslog file
-
here you have all i could get from your question:
-
@kvothe88 said in ipxe booting "waiting for the link to come up":
On toshiba, on the same consolore where i ran the debug task, there is not a syslog file
Yes on pxe booting computer there should be a log file in /var/log its either called syslog or messages. I can’t remember at the moment which.
I see that the right kernel driver is being used. “r8169”. So its either missing the proper firmware (need info from above) or I need to send you an updated FOS Linux kernel that has the current realtek driver. Lets find out about the firmware bit first.
-
@george1421 hi Gorge,
got nothing from that file. No /Failed strings shown, no /failed, no /error or /firmware either with capital letter
-
@kvothe88 I think I would like to personally see that log file. You can use the scp command on the pxe booting client computer to copy the file to you fog server from there upload it to this thread with a .txt file extension. The command is similar to this
scp /var/log/messages root@192.168.1.10:/images
Of course change the user and IP address to match your environment. The command will drop the file into /images on the fog server.ok then lets try one of the one-off kernels that have the updated realtek driver in it.
https://drive.google.com/file/d/1vSu5L-DAZYK7VYiJtFfCYrrqJb963cMg/view?usp=sharing
Download that file from the link and save it as
bzImageRT3
in/var/www/html/fog/service/ipxe
directory.Manually register this host with the fog server using the fog web ui then in the host registration page, set the kernel field to
bzImageRT3
. Save the settings and then pxe boot the target computer into the hardware compatibility check. See if it detects the NIC there.Also, just so we know the scope, approx how old is this laptop?
-
@george1421 said in ipxe booting "waiting for the link to come up":
bzImageRT3
Hi George! Thanks again.
So, i was not able to move the file to my fog server, that was the error:
Again, i was able to register the host on the fogserver UI. I copied the kernel where you told me and i booted from pxe. Tried the compatibility mode and the bzImageRT3 was using was OK. Got some errors about dell_smbios but unfortunatly the link error showed again:
-
@kvothe88 Sorry I’m trying to do too many things this AM. Yes scp should have worked, but we have an issue with the NETWORK. So no network functions.
OK will the updated realtek drives in the kernel I sent did not fix the issue. Its still pointing back to the firmware.
If you have a usb flash drive we can use that to get the messages file out of the pxe booted computer.
- Insert the usb drive into the target computer.
- issue this command from the debug console
lsblk
- I identify the usb flash drive by its size in the list. For the reset of this lets assume it shows up and
/dev/sdb
- Now let make a directory where we can mount that usb drive.
mkdir /ext
- Lets connect the usb drive to that directory
mount /dev/sdb1 /ext
- That command will mount the first partition on that usb stick to the /ext directory. Issue this command to see the contents of the usb drive
ls -la /ext
- If you see what is expected then copy the log file to the use flash drive directory.
cp /var/log/messages /ext
- When done unmount that usb drive with
unmount /ext
- Now you will be able to remove the usb drive. Don’t remove it before you unmount it and the command completes.
-
@george1421 hi George, thanks for your reply. Here you have the file:
-
@kvothe88 Just to confirm this is the real mac address of the network adapter in question “60:02:92:3e:ab:7a” ?? The log file is saying the link is down.
Also how old is this computer?
-
@george1421 hi George,
Yes that’s the Mac. Interface is not able to get ip so that makes sense…
Maybe 5, 6 years old
-
@kvothe88 After much head scratching I understand what is going on here. I don’t have a solution atm, but at least I have an idea.
Realtek has a generic network driver called the r8169. That supports (mostly) a large range of network adapters.
Looking at the boot log of the file you sent me I see this:
r8169 0000:03:00.0 eth0: RTL8106e, 60:02:92:3e:ab:7a, XID 449, IRQ 91
So the real nic is an RTL8106 not an r8168/r8169.
I also found a reference that there is a bug in the R8169 kernel after version 5.4 and the recommendation was to use the 4.19.x series kernel.
it appears the issue was fixed in 5.4.33 (so it should be fixed in 5.6.18).
So what can we do quickly? See if downgrading your linux kernel to 4.19.x allows the system to boot.
-
@kvothe88 @george1421 It’s been busy at work and so I only just saw all your posts. Interesting. From the very first picture you posted we see that enp3s0 is there but says
NO-CARRIER
and so it seems to not be able to detect the link to be up. Firmware blob does not seem to be an issue from the information posted so far.Let’s try disabling auto-negotiation for testing to see if that makes a difference. Either do that on the switch if you can or boot into a debug session and run
ethtool -s enp3s0 autoneg off
- then wait for a bit and check withip a s
to see if it theNO-CARRIER
goes away…