Error PXE-E18 - Lenovo ThinkPad E16
-
Hello,
I have a timeout during pxe boot with an install of FOG (version 1.5.10.1622) on a Lenovo ThinkPad E16 gen 1.
SecureBoot is disabled.
I didn’t notice error with legacy boot on older computers and a tftp client under Windows can download the “.efi” files.
I tested with snponly.efi and ipxe.efi, same error.
My personnal FOG server under version 1.6.0 doesn’t give any error.
In other high school, I didn’t had problem with a FOG 1.5.10
I have download last kernel (6.6.49), the last Initrd (2024.02.5) and build ipxe file using the doc here : https://docs.fogproject.org/en/latest/kb/reference/compile_ipxe_binaries/ but problem is the same.
I’m out of idea.
Thank you for your help. -
@jmeyer ok you did a lot of the initial debugging so its not the easy stuff.
The bit harder stuff is using either tcpdump on the fog server if its on the same subnet as the pxe booting computer. Or a witness computer loaded with wireshark on the target computers subnet.
If the fog server and pxe booting computer are on the same subnet you can use this tutorial to capture a pcap file: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue you can look at the pcap with wireshark or post it here and I will look at it.
or if your target computer is on a different subnet than the fog server you will need to use a witness computer with wireshark loaded. Use a capture filter of “port 67 or port 68” to only grab the DORA (dhcp) packets on the network.
What you want to look for is there will be a discover packet from the target computer that says “hello I’m here, come configure me”.
You will get a OFFER from one or more dhcp servers. This is the packet you are interested in. In the OFFER there will be a header section with the fields next-server and boot-file. These need to be populated with the fog servers IP address and snponly.efi (or whatever). This is the bootp fields. In addition if you look down in the dhcp options 66 and 67 that should match what is in next-server and boot-file fields. I’m suspecting something is not right with this section.
This is a pxe boot (iPXE) issue and unrelated to what FOS kernel or version of FOG you are running.
-
As said, no problem with legacy boot that use same configuration on dhcp (option 66 and 67) and use the same vlans.
I get well the IP then it give timeout.
I’m not on the same vlan but is there differences between legacy and efi or if one work, the other should ?
Maybe it’s rights between vlan that block only efi.
I’m not sure since when I have the problem but we changed switchs few weeks ago.I’ll put a wireshark tomorow to look what happen.
Thank you
-
@jmeyer said in Error PXE-E18 - Lenovo ThinkPad E16:
I’ll put a wireshark tomorow to look what happen.
You really need to see what the client is being told and by who to try to explain why bios works and uefi doesn’t. There is something unknown going on here.
-
We found IPS rules (I think it’s the name) on the firewall that was blocking TFTP.
It’s now loading but terribly slow (average 6 Mb/s on a Gb link).
We will make more test today. -
@jmeyer said in Error PXE-E18 - Lenovo ThinkPad E16:
It’s now loading but terribly slow (average 6 Mb/s on a Gb link).
What is loading slow iPXE or the image via partclone? I haven’t been following closely the issue but I think I saw that someone tried an earlier version of the FOS kernel, like in the 6.1 range and imaging on these current lenovos went at normal speeds. I don’t know what the linux developers did between 6.1 and 6.6 to cause this slowness. Have we identified what nic adapter is installed in these computers? We would need the hardware ID of the nic to research it.
-
@george1421 I have created a 500mo file since most of original files are small in the FOG tftpboot directory and try to download it with Windows TFTP client.
It takes me around 10 to 15 min to download it.
I have also tested on ThinkPad E15 gen 4.
Is there a speed limit in TFTP ?On the E15 and the E16, it’s a Realtek RTL 8168 Series.
ID on E15 is PCI\VEN_10EC&DEV_8168&SUBSYS_50B117AA&REV_15
ID on E16 is PCI\VEN_10EC&DEV_8168&SUBSYS_50D517AA&REV_15edit : I have find out that Windows TFTP client is mess up with max speed.
I have reach 450Mb/s with Ivanti from PXE when I still stuck at less than 20 Mb/s with Windows TFTP client.
Another thing is that I have add FOG menu entry to download a file from Ivanti PXE (chain command) and I also download at only 6 Mb/s.
I’ll run more test directly from PXE diring the next days. -
@jmeyer Make sure you are paying attention to MB/s and Mb/s. It goes without saying there is a difference.
I wrote an article about 8 years ago now. https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast
This contained some benchmarks I did at the time. FOG imaging speed is (according to Partclone) is made up of several elements. FOG Server disk subsystem speed, the speed at which the fog server and move the image from the disk to network interface, network transfer time, the client receiving the file and expanding it in memory, and finally the client moving the data to local storage. All of those go into the number displayed by partclone.
For clarity let me present some theoretical best network speeds.
For a 100Mb/s network link the maximum transfer speed is 12.5MB/s or 750MB/m
For a 1GbE network link the maximum transfer speed is 125MB/s or or 7.5GB/m
For a 10GbE network link the maximum transfer speed is 1250MB/sPay attention if your network speed is getting capped at or around one of the maximum transfer speeds. I’ve seen someone in the past only get 8MB/s transfer rate. He/she had 1GbE on each end, but between the ends there was a network switch link that was running in 100Mb/s half duplex. Not saying that is your case, but stuff happens.
Now why I referenced that article above. It has the commands to benchmark your hardware. iperf3 will give you network speeds between the network interfaces and the kernel. It has nothing to do with moving fog image blocks between systems. If you put the FOS target system in debug mode you can run iperf3 between FOS Linux and the FOG server. This will give you an idea of the bandwidth you have. I would do this with a computer that is exhibiting the slow imaging speed and then one that has normal imaging speed. Lets see if the network speeds are compatible. I’m not willing to rule out is the linux kernel version 6.1.x vs 6.6.x it maybe there is something that is missing in FOS linux for this new hardware.
When you have the slow target computer in debug mode run this command to see if the kernel is complaining about the hardware/firmware.
grep -i -e "firm" /var/log/syslog
That should return any lines that complain about needing special firmware for the hardware. -
@george1421 I have run all test and first, there were a firewall rule that were blocking TFTP from the FOG server then after running more test, I realised that it’s not link to hardware or sofware on the server.
During test, I just changed the port on the switch next the computer to keep most of the hardware between server and client.
On same vlan I run at 450 Mbits/s (around 50 mo/s) and when I change vlan, I run at 150 Mb/s only.
And under 20 mo/s max with Windows TFTP client so I give up using it to run more tests.
Colleagues says there is not QOS but there is definitly something reducing the speed.