HP ProBook 640 G8 imaging extremely slowly
-
@george1421 said in HP ProBook 640 G8 imaging extremely slowly:
@jacob-gallant said in HP ProBook 640 G8 imaging extremely slowly:
just the ProBook 640 G8
The debugging truth table points to the probook at fault root.
So what is unique about this workstation from previous models? NVMe vs SATA? Specific NVMe disk?
If you want to debug this hardware a bit more we can do that. The issue will be with either the network stack or the disk subsystem.
Here is a link to iperf3 https://drive.google.com/file/d/1fLYGI-roYGongTVRS_4zQ7dWJ7osN4AW/view?usp=sharing
Place it in
/images
directoryThe concept of testing is coming from this post: https://forums.fogproject.org/post/98231
the setup for testing is pretty easy, just setup a debug deployment to this computer. (tick the debug checkbox before submitting the deployment task).
Now pxe boot the target computer after a few screens of text you will be dropped to the FOS linux command prompt.
At the fos linux command prompt key in
fog
You will need to press enter after each breakpoint in the imaging code. After you see the first partclone copy complete press the Ctrl-C to break out of the deployment script.The iperf command will be in
/images
directory on the pxe booting computer. Copy it over to the /tmp directory on the target computercp /images/iperf3 /tmp
Then run the iperf command as outlined in this post. https://forums.fogproject.org/post/98230
You will need to setup the iperf3 receiver on the FOG server and then the client on the target computer. This test will see how fast the network speed is between the target computer and the FOG server.Once the network bits have been tested then testing the hard drive is next.
This will be a bit more complicated to test, so see if the
hdparm
command comes back with something in FOS Linux.Here’s the iperf3 results: https://photos.app.goo.gl/xXFPLZFHAJT7dPEo9
-
@jacob-gallant Well what I find troubling is the Retr (retransmitts) On a stable network that should be zero. It kind of makes me think networking (could be loaded network infrastructure could be nic in computer). These retransmitts would cause it initially to have a pretty good performance but then start backing off right away until it found a happy results at a slower transfer rate. It would be interesting to see if you get the same results on retransmissions on different hardware.
If you are still at the fos linux command prompt see if
hdparm
command is installed if so thenrun
lsblk
to find the drive. It will be /dev/sda or /dev/nvme(something).Once you’ve found the disk run this
hdparm -Tt /dev/sda
assuming the disk is/dev/sda
first sata disk.There is another test where we need to use fdisk to remove the current partition and make a new partition the size of the disk and then format the partition with ext4 format. Then we can mount it and run the dd command from the article to test write speed to the disk, but I have to run off to a meeting so I won’t have a chance to write down the testing procedure right now.
-
@george1421 I ran iperf on a working device and here are the results, 0 retransmits as you mentioned. You’re also correct that that is exactly what I’m seeing on the 640 G8, starts off with reasonable performance but quickly drops down to a crawl:
https://photos.app.goo.gl/oVrtqpnhmYHh39LK9
Here are the results of the hdparm command.
-
@jacob-gallant said in HP ProBook 640 G8 imaging extremely slowly:
I ran iperf on a working device and here are the results, 0 retransmits as you mentioned.
In the same network jack as the 640 G8?
The network adapter in the 640 G8 is built in or USB based?
-
@george1421 said in HP ProBook 640 G8 imaging extremely slowly:
@jacob-gallant said in HP ProBook 640 G8 imaging extremely slowly:
I ran iperf on a working device and here are the results, 0 retransmits as you mentioned.
In the same network jack as the 640 G8?
The network adapter in the 640 G8 is built in or USB based?
The very same, yes. And it’s built-in.
-
@jacob-gallant If you have your other computer that works, if you have windows loaded on it can you get the hardware ID of that network interface. We know the 640G8 is 8086:15fc (linux format). So the question is the working one the same?
I have a one off kernel 5.10.x that we might want to try. But so far I’m leaning towards the nic itself or the kernel nic driver in 5.6.18.
-
@george1421 The working one is different, 8086:15e3
-
@jacob-gallant Ok the 15e3 nic is an older nic that was first introduced in the 4.6 linux kernel. The 15fc was first introduced in 5.5 linux kernel and we are currently trying 5.6.18 “right?” (from the FOS Linux debug console you can key in
uname -r
to give you the kernel version).Here is an experimental FOS Linux kernel 5.10.2. Download this file and rename as
bzImage
(case is important)
https://drive.google.com/file/d/1-4HyQD8ttz_GCE_vKrvuydFVqcPUMqzU/view?usp=sharingrename the bzImage file in
/var/www/html/fog/service/ipxe
directory and drop this file in there. Lets see if this kernel gives us a better deployment. I know there was again a major rewrite in the 5.9.x series of the linux kernel, akin to what happened with 5.5 -
@george1421 Same results with 5.10.12 I’m afraid. We were using 5.6.18 for all of the previous tests, that’s right.
-
@jacob-gallant Well nuts. I was hoping the updated kernel would function better. Yes we need 5.6.18 to have support for that network interface, if you were using 4.19x the network interface wouldn’t work at all.
-
@Jacob-Gallant @george1421 So far it all looks like a driver issue in the Linux kernel. Though I am really wondering that we don’t find other users’ reports about this NIC.
Maybe this is some kind of jumbo frame issue?
@Jacob-Gallant Would you be willing to capture a short part of the network traffic on your FOG server and upload the PCAP so we can take a look? Schedule a debug deploy task. Boot the host up and ein
ip a s
and note down the IP address before you start the job viafog
command. Now runtcpdump -w /tmp/dump.pcap host x.x.x.x
as root on your FOG server using the IP address noted down. Leave that tcpdump sit there and step through the deply task on the machine. Quickly after the first blue partclone screen starts you want to stop tcpdump on your FOG server (Ctrl+c) so the PCAP file is not growing too much! I am fairly sure we see the retransmits at that point already and might find why.Just copy the file /tmp/dump.pcap from your server and upload to a share we can access.
-
@george1421 I’m currently researching this issue. I do see others with speed problems with this series of nic adapters.
-
@sebastian-roth @george1421 Thanks to you both for all of your time. Here is the capture:
https://drive.google.com/file/d/1WS8e2R9kR-ZjpqzgikmSg0CakZJYJi4h/view?usp=sharing
-
@Jacob-Gallant I looked at the PCAP for quite some time now. We see clear signs of “network congestion” - meaning that packets are being re-transmitted causing the TCP connection to slow down.
The connection starts just fine and the host sends a file read request to the FOG server. Now the FOG server starts to send a first large packet. Standard ethernet MTU is 1518 bytes and the FOG server sends 7240 bytes in one single TCP packet - a so called jumbo frame.
So I am wondering if you can improve speed by disabling LRO (Large Receive Offload), TSO (TCP Segmentation Offload) and GSO (Generic Segmentation Offload) using ethtool. Schedule and boot into another debug deploy session. On the shell run:
ip a s ethtool -K eth0 lro off ethtool -K eth0 tso off ethtool -K eth0 gso off
The first command is just to confirm the network interface name (could be
eth0
or different) to use with ethtool later on. You can try disabling all three at once or just one and give it a try.There are various I219-V cards/chips listed with different PCI IDs. Searching with 8006:15fc I couldn’t find much on the web but searching for I-219V there are a few people complaining about issues:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802691
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1785171
https://forums.linuxmint.com/viewtopic.php?t=327435
https://access.redhat.com/solutions/3615791
Though I am really in doubt if any of those match your exact situation. -
@sebastian-roth said in HP ProBook 640 G8 imaging extremely slowly:
@Jacob-Gallant I looked at the PCAP for quite some time now. We see clear signs of “network congestion” - meaning that packets are being re-transmitted causing the TCP connection to slow down.
The connection starts just fine and the host sends a file read request to the FOG server. Now the FOG server starts to send a first large packet. Standard ethernet MTU is 1518 bytes and the FOG server sends 7240 bytes in one single TCP packet - a so called jumbo frame.
So I am wondering if you can improve speed by disabling LRO (Large Receive Offload), TSO (TCP Segmentation Offload) and GSO (Generic Segmentation Offload) using ethtool. Schedule and boot into another debug deploy session. On the shell run:
ip a s ethtool -K eth0 lro off ethtool -K eth0 tso off ethtool -K eth0 gso off
The first command is just to confirm the network interface name (could be
eth0
or different) to use with ethtool later on. You can try disabling all three at once or just one and give it a try.There are various I219-V cards/chips listed with different PCI IDs. Searching with 8006:15fc I couldn’t find much on the web but searching for I-219V there are a few people complaining about issues:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802691
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1785171
https://forums.linuxmint.com/viewtopic.php?t=327435
https://access.redhat.com/solutions/3615791
Though I am really in doubt if any of those match your exact situation.Apologies for the delay in getting back to you, I’ve been working from home so far this week so I didn’t have access to the device. Unfortunately these steps didn’t improve anything.
-
@Jacob-Gallant After running those commands mentioned, can you run
ethtool -k
(lower case k this time) and take a picture of the output and post here? -
@sebastian-roth Here you are! https://photos.app.goo.gl/WnHEE63jFEjKvT4N9
-
@Jacob-Gallant Ok, seems like it actually did disable TSO and GSO. Can’t find LRO in the output but maybe the driver doesn’t support that.
Unfortunately I am running out of ideas with that.
Can you try booting up from a Linux Live DVD/USB and do some network testing with that? Try a distro with a very recent kernel if possible.
-
@sebastian-roth Hi Sebastien, apologies again for the delayed response. I ran a live USB of Ubuntu 20.10 and network performance was normal. We also have Windows 10 loaded on one of the devices manually and it performs normally as well. It seems specific to FOG performance unfortunately.
-
@Jacob-Gallant Well I did expect Windows to have normal network speed. But Ubuntu is using the Linux Kernel and therefore a pretty similar driver for this network card. What tests did you do for network speeds? Iperf again to really be able to compare results?
Please boot up Ubuntu again and run the following commands in a root command shell:
uname -a lspci -nn | grep -A 2 -i net
There is some light at the end of the tunnel if Ubuntu doesn’t show the same issue. But it will be a long struggle to find out why. Comparing kernel versions and an enourmous list of patches Ubuntu adds to the official kernel.
Now that I write this I think it’s better to test other live distros as well, try Arch Live and maybe SystemRescueCD. With every distro run the same iperf test to be able to compare results and run the above commands, posting results here.