HP ProBook 640 G8 imaging extremely slowly
-
@george1421 I ran iperf on a working device and here are the results, 0 retransmits as you mentioned. You’re also correct that that is exactly what I’m seeing on the 640 G8, starts off with reasonable performance but quickly drops down to a crawl:
https://photos.app.goo.gl/oVrtqpnhmYHh39LK9
Here are the results of the hdparm command.
-
@jacob-gallant said in HP ProBook 640 G8 imaging extremely slowly:
I ran iperf on a working device and here are the results, 0 retransmits as you mentioned.
In the same network jack as the 640 G8?
The network adapter in the 640 G8 is built in or USB based?
-
@george1421 said in HP ProBook 640 G8 imaging extremely slowly:
@jacob-gallant said in HP ProBook 640 G8 imaging extremely slowly:
I ran iperf on a working device and here are the results, 0 retransmits as you mentioned.
In the same network jack as the 640 G8?
The network adapter in the 640 G8 is built in or USB based?
The very same, yes. And it’s built-in.
-
@jacob-gallant If you have your other computer that works, if you have windows loaded on it can you get the hardware ID of that network interface. We know the 640G8 is 8086:15fc (linux format). So the question is the working one the same?
I have a one off kernel 5.10.x that we might want to try. But so far I’m leaning towards the nic itself or the kernel nic driver in 5.6.18.
-
@george1421 The working one is different, 8086:15e3
-
@jacob-gallant Ok the 15e3 nic is an older nic that was first introduced in the 4.6 linux kernel. The 15fc was first introduced in 5.5 linux kernel and we are currently trying 5.6.18 “right?” (from the FOS Linux debug console you can key in
uname -r
to give you the kernel version).Here is an experimental FOS Linux kernel 5.10.2. Download this file and rename as
bzImage
(case is important)
https://drive.google.com/file/d/1-4HyQD8ttz_GCE_vKrvuydFVqcPUMqzU/view?usp=sharingrename the bzImage file in
/var/www/html/fog/service/ipxe
directory and drop this file in there. Lets see if this kernel gives us a better deployment. I know there was again a major rewrite in the 5.9.x series of the linux kernel, akin to what happened with 5.5 -
@george1421 Same results with 5.10.12 I’m afraid. We were using 5.6.18 for all of the previous tests, that’s right.
-
@jacob-gallant Well nuts. I was hoping the updated kernel would function better. Yes we need 5.6.18 to have support for that network interface, if you were using 4.19x the network interface wouldn’t work at all.
-
@Jacob-Gallant @george1421 So far it all looks like a driver issue in the Linux kernel. Though I am really wondering that we don’t find other users’ reports about this NIC.
Maybe this is some kind of jumbo frame issue?
@Jacob-Gallant Would you be willing to capture a short part of the network traffic on your FOG server and upload the PCAP so we can take a look? Schedule a debug deploy task. Boot the host up and ein
ip a s
and note down the IP address before you start the job viafog
command. Now runtcpdump -w /tmp/dump.pcap host x.x.x.x
as root on your FOG server using the IP address noted down. Leave that tcpdump sit there and step through the deply task on the machine. Quickly after the first blue partclone screen starts you want to stop tcpdump on your FOG server (Ctrl+c) so the PCAP file is not growing too much! I am fairly sure we see the retransmits at that point already and might find why.Just copy the file /tmp/dump.pcap from your server and upload to a share we can access.
-
@george1421 I’m currently researching this issue. I do see others with speed problems with this series of nic adapters.
-
@sebastian-roth @george1421 Thanks to you both for all of your time. Here is the capture:
https://drive.google.com/file/d/1WS8e2R9kR-ZjpqzgikmSg0CakZJYJi4h/view?usp=sharing
-
@Jacob-Gallant I looked at the PCAP for quite some time now. We see clear signs of “network congestion” - meaning that packets are being re-transmitted causing the TCP connection to slow down.
The connection starts just fine and the host sends a file read request to the FOG server. Now the FOG server starts to send a first large packet. Standard ethernet MTU is 1518 bytes and the FOG server sends 7240 bytes in one single TCP packet - a so called jumbo frame.
So I am wondering if you can improve speed by disabling LRO (Large Receive Offload), TSO (TCP Segmentation Offload) and GSO (Generic Segmentation Offload) using ethtool. Schedule and boot into another debug deploy session. On the shell run:
ip a s ethtool -K eth0 lro off ethtool -K eth0 tso off ethtool -K eth0 gso off
The first command is just to confirm the network interface name (could be
eth0
or different) to use with ethtool later on. You can try disabling all three at once or just one and give it a try.There are various I219-V cards/chips listed with different PCI IDs. Searching with 8006:15fc I couldn’t find much on the web but searching for I-219V there are a few people complaining about issues:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802691
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1785171
https://forums.linuxmint.com/viewtopic.php?t=327435
https://access.redhat.com/solutions/3615791
Though I am really in doubt if any of those match your exact situation. -
@sebastian-roth said in HP ProBook 640 G8 imaging extremely slowly:
@Jacob-Gallant I looked at the PCAP for quite some time now. We see clear signs of “network congestion” - meaning that packets are being re-transmitted causing the TCP connection to slow down.
The connection starts just fine and the host sends a file read request to the FOG server. Now the FOG server starts to send a first large packet. Standard ethernet MTU is 1518 bytes and the FOG server sends 7240 bytes in one single TCP packet - a so called jumbo frame.
So I am wondering if you can improve speed by disabling LRO (Large Receive Offload), TSO (TCP Segmentation Offload) and GSO (Generic Segmentation Offload) using ethtool. Schedule and boot into another debug deploy session. On the shell run:
ip a s ethtool -K eth0 lro off ethtool -K eth0 tso off ethtool -K eth0 gso off
The first command is just to confirm the network interface name (could be
eth0
or different) to use with ethtool later on. You can try disabling all three at once or just one and give it a try.There are various I219-V cards/chips listed with different PCI IDs. Searching with 8006:15fc I couldn’t find much on the web but searching for I-219V there are a few people complaining about issues:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802691
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1785171
https://forums.linuxmint.com/viewtopic.php?t=327435
https://access.redhat.com/solutions/3615791
Though I am really in doubt if any of those match your exact situation.Apologies for the delay in getting back to you, I’ve been working from home so far this week so I didn’t have access to the device. Unfortunately these steps didn’t improve anything.
-
@Jacob-Gallant After running those commands mentioned, can you run
ethtool -k
(lower case k this time) and take a picture of the output and post here? -
@sebastian-roth Here you are! https://photos.app.goo.gl/WnHEE63jFEjKvT4N9
-
@Jacob-Gallant Ok, seems like it actually did disable TSO and GSO. Can’t find LRO in the output but maybe the driver doesn’t support that.
Unfortunately I am running out of ideas with that.
Can you try booting up from a Linux Live DVD/USB and do some network testing with that? Try a distro with a very recent kernel if possible.
-
@sebastian-roth Hi Sebastien, apologies again for the delayed response. I ran a live USB of Ubuntu 20.10 and network performance was normal. We also have Windows 10 loaded on one of the devices manually and it performs normally as well. It seems specific to FOG performance unfortunately.
-
@Jacob-Gallant Well I did expect Windows to have normal network speed. But Ubuntu is using the Linux Kernel and therefore a pretty similar driver for this network card. What tests did you do for network speeds? Iperf again to really be able to compare results?
Please boot up Ubuntu again and run the following commands in a root command shell:
uname -a lspci -nn | grep -A 2 -i net
There is some light at the end of the tunnel if Ubuntu doesn’t show the same issue. But it will be a long struggle to find out why. Comparing kernel versions and an enourmous list of patches Ubuntu adds to the official kernel.
Now that I write this I think it’s better to test other live distros as well, try Arch Live and maybe SystemRescueCD. With every distro run the same iperf test to be able to compare results and run the above commands, posting results here.
-
@sebastian-roth I hadn’t used iperf, just a regular speed test (speedtest.net). Here are the results from iperf for ubuntu (still quite a few retries when connecting to the main FOG server):
https://photos.app.goo.gl/FDvPSgLoKVAUWDpY7Here are the results from the command above:
https://photos.app.goo.gl/tcVtyXBZnWzbVN1B6And here are the iperf3 results from Arch:
https://photos.app.goo.gl/qXU7b5tn8b5ohAan9I can’t get SystemRescueCD to work as of yet, it will not connect to the network at all with that, but I’ll post the results when I get them.
-
@jacob-gallant said in HP ProBook 640 G8 imaging extremely slowly:
Here are the results from iperf for ubuntu (still quite a few retries when connecting to the main FOG server):
https://photos.app.goo.gl/FDvPSgLoKVAUWDpY7Looks pretty similar to what we had with FOG with many retries from my point of view: https://photos.app.goo.gl/xXFPLZFHAJT7dPEo9
As well Arch shows the retries. I really wonder why we don’t find more people reporting issues with that driver/NIC?!?
About the
lspci
command, I am sorry I got that wrong just typing it from the top of my head. I meant:lspci -k | grep -A 2 -i net
(so we see which kernel driver is used)