HP ProBook 640 G8 imaging extremely slowly
-
@Jacob-Gallant Ok, seems like it actually did disable TSO and GSO. Can’t find LRO in the output but maybe the driver doesn’t support that.
Unfortunately I am running out of ideas with that.
Can you try booting up from a Linux Live DVD/USB and do some network testing with that? Try a distro with a very recent kernel if possible.
-
@sebastian-roth Hi Sebastien, apologies again for the delayed response. I ran a live USB of Ubuntu 20.10 and network performance was normal. We also have Windows 10 loaded on one of the devices manually and it performs normally as well. It seems specific to FOG performance unfortunately.
-
@Jacob-Gallant Well I did expect Windows to have normal network speed. But Ubuntu is using the Linux Kernel and therefore a pretty similar driver for this network card. What tests did you do for network speeds? Iperf again to really be able to compare results?
Please boot up Ubuntu again and run the following commands in a root command shell:
uname -a lspci -nn | grep -A 2 -i net
There is some light at the end of the tunnel if Ubuntu doesn’t show the same issue. But it will be a long struggle to find out why. Comparing kernel versions and an enourmous list of patches Ubuntu adds to the official kernel.
Now that I write this I think it’s better to test other live distros as well, try Arch Live and maybe SystemRescueCD. With every distro run the same iperf test to be able to compare results and run the above commands, posting results here.
-
@sebastian-roth I hadn’t used iperf, just a regular speed test (speedtest.net). Here are the results from iperf for ubuntu (still quite a few retries when connecting to the main FOG server):
https://photos.app.goo.gl/FDvPSgLoKVAUWDpY7Here are the results from the command above:
https://photos.app.goo.gl/tcVtyXBZnWzbVN1B6And here are the iperf3 results from Arch:
https://photos.app.goo.gl/qXU7b5tn8b5ohAan9I can’t get SystemRescueCD to work as of yet, it will not connect to the network at all with that, but I’ll post the results when I get them.
-
@jacob-gallant said in HP ProBook 640 G8 imaging extremely slowly:
Here are the results from iperf for ubuntu (still quite a few retries when connecting to the main FOG server):
https://photos.app.goo.gl/FDvPSgLoKVAUWDpY7Looks pretty similar to what we had with FOG with many retries from my point of view: https://photos.app.goo.gl/xXFPLZFHAJT7dPEo9
As well Arch shows the retries. I really wonder why we don’t find more people reporting issues with that driver/NIC?!?
About the
lspci
command, I am sorry I got that wrong just typing it from the top of my head. I meant:lspci -k | grep -A 2 -i net
(so we see which kernel driver is used) -
@sebastian-roth How does this look? https://photos.app.goo.gl/2iZT3HmDE3A1wxJH9
-
@jacob-gallant said in HP ProBook 640 G8 imaging extremely slowly:
How does this look? https://photos.app.goo.gl/2iZT3HmDE3A1wxJH9
Yes, perfect. So we know Ubunut using a 5.8.x kernel (with many specific patches included) is using the same kernel driver
e1000e
that we also use with FOS. From the iperf output to me it looks like Ubuntu has the same issue with high number of retries when testing with iperf - same as using Arch Linux. You seem to not notice the issue when testing with speedtest.net but I think this test is not valid in this case because packets from the internet usually come in smaller portions (lower path MTU than in the local subnet where you have jumbo frames) and would not cause the same slowness…So sorry I have put some hope on this when we had the first tests with Ubuntu. Now I think it’s just the same.
As a last resort we might compiling a one-off kernel for you using the driver provided by Intel - though I have to say that I haven’t looked into this yet and it might turn out to be a hazzle. Not sure yet.
-
@sebastian-roth OK, totally understand. Just let me know! Thanks for everything Sebastian.
-
@Jacob-Gallant There is one more thing you might want to look at with the current kernel before you get into testing the patched one below. Schedule a debug deploy task and when you get to the shell run
ip link show | grep mtu
and see what number it states right after the key wordmtu
.Although it did not compile straight from the Intel code it wasn’t too much work to fix and get it build.
Download patched kernel binary, and put in
/var/www/html/fog/service/ipxe/
directory on your FOG server. Now edit the host settings of your HP ProBook 640 G8 and set Host Kernel tobzImage-5.10.12-e1000e-3.8.4
. Schedule a deploy task and watch the screen when it PXE boots - it should saybzImage-5.10.12-e1000e-3.8.4...ok
when loading the kernel.Will be interesting to hear of deployment speeds are in a normal range with this kernel.
Just for reference if we need to re-compile this again:
- When using the Intel driver code v3.8.4 there are still calls to PM QoS functions that don’t exist in 5.10.x kernels anymore. Swaping out the function names as seen in this post on the kernel mailing list.
- Next is a function call that was completely removed.
- Then I commented out the use of
xdp_umem_page *pages
in kcompat.c as this was removed in mainline kernel and is only used for older kernel versions in kcompat.c anyway. - Finally re-enabled
CONFIG_PM
in our kernel config to get past the last compile error. A different solution would be to move the function definition ofe1000e_pm_thaw
outside the#ifdef CONFIG_PM
block.
-
@sebastian-roth Hey Sebastian, the mtu size was 1500 when I ran than command. Unfortunately when I used the patched kernel I received a “no network interfaces found” error, did I miss a step?
-
@jacob-gallant said in HP ProBook 640 G8 imaging extremely slowly:
Unfortunately when I used the patched kernel I received a “no network interfaces found” error, did I miss a step?
Don’t think there is a step you can miss. Interesting it wouldn’t find your network card using the official source code from Intel. Didn’t expect that. I’ll have a look at the code again.
Meanwhile you could schedule a debug deploy task for this host, boot up to the shell and then run the following commands, take a picture and post here:
dmesg | grep -e "e1000e" -e "eth[0-9]" -e "enp[0-9]" ip a s
Update: I have checked kernel config and code used and I am sure the e1000e driver is included. We’ll see what the dmesg output can offer.
-
@sebastian-roth Here you are. https://photos.app.goo.gl/uJJKgG4PSPHY8kQJ6
-
@Jacob-Gallant Comparing the probe function source code between 5.10.12 kernel and Intel driver does not bring up many differences in that code. Very strange you see the
e1000e: probe of ... failed with -22
and I am not sure what that means.Wooooho, I just found that the 3.8.4 version mentioned on the Intel website as “latest” is not actually the latest one: https://sourceforge.net/projects/e1000/files/e1000e stable/
I’ll build another kernel using that one later on.
-
@Jacob-Gallant That one compiled without any modification needed, haha! Wish I had found that one earlier.
https://fogproject.org/kernels/bzImage-5.10.12-e1000e-3.8.7 (note the different Intel driver version at the end of the filename)
Please give it a try using debug again and run the same
dmesg
command. -
@sebastian-roth Thanks again Sebastian, here is the results of that command. Unfortunately we see the same network performance issues with this kernel.
-
@Jacob-Gallant Ok, so the 3.8.7 driver version does initialize and probe the NIC correctly.
Unfortunately we see the same network performance issues with this kernel.
Oh well… we have tried! Can’t believe we are the only ones seeing this issue. Maybe this NIC is not in use widely?!
If you are really keen you could try contacting people on the mailinglist or through email directly - Linux kernel or Intel drivers.
-
@sebastian-roth Thanks, I appreciate your efforts!
-
Hello .
I have the same problem. But when i plug a usb Stick ( empty or not ) at start , the speed is ok . But i dont know why.
-
@Dungody That really is strange, though interesting. Have you been able to verify this by testing with and without the USB key for several times??
Sure you have exactly the same model, HP ProBook 640 G8?
From what we have seen in the network dump I was sure this is due to network congestion. But maybe I was on the wrong track with this and network was only what we saw slowing down because of IO slowness being the root cause. Does FOG deploy to the USB key when it’s plugged in?? Which device name do you see in the blue partclone screen?
/dev/sda1
? -
@dungody said in HP ProBook 640 G8 imaging extremely slowly:
Hello .
I have the same problem. But when i plug a usb Stick ( empty or not ) at start , the speed is ok . But i dont know why.
Depending on the firmware I have seen inserting a usb drive change the device naming order, where depending on how the disks are detected on one firmware the usb drive would be detected first making /dev/sda be the usb drive, and on some other firmware (hardware) inserting the usb drive it will take on the /dev/sdb name. It would be interesting to prove out by setting up a debug deploy with the usb drive installed and then run the
lsblk
command to see what was in the /dev/sda slot. Then issue thefog
command to start imaging from the debug console.It would be strange to see it deploy to the USB drive faster than the onboard nvme disk.