Slowdown Unicast and Multicast after upgrading FOG Server

george1421

@Quazz would the complete fog inventory for this system give us enough data or do we need to dig deeper? If it doesn’t give us all of the data at least it would be a start. What would be grand is of the OP had 2 systems on his campus where one worked correctly and the other is slow. Then we could contrast and compare these two systems.

@Sebastian-Roth Is there a place where we can still download the 1.5.5 or 1.5.6 binaries zip file? I’m wondering if we replace the current bzImage and init.xz with the ones from 1.5.5 or 1.5.6 does that change the performance of these systems. While I highly doubt its the FOG server, this would at least isolate the issue to the FOS Linux install (unless that is our conclusion already).

Sebastian Roth

@george1421 Good point on trying the older binaries. Though I’d expect that you get issues going back to very early binary versions like 1.5.3… They are all available on the fogproject.org website:
https://fogproject.org/binaries1.5.6.zip
https://fogproject.org/binaries1.5.5.zip
and so on all the way to 1.3.0…

george1421

@Sebastian-Roth Yes going back to 1.5.3 version of FOS (at least for the inits) would be a good test of before and after upgrades causing this slowness. All the OP needs to do is download and extract the init.xz file from the zip file and move it to the FOG server to test.

Sebastian Roth

@george1421 @mp12 Using init and kernel binaries from the same archive as you will run into kernel panics quite easily if you do otherwise.

@mp12 I am wondering if you see the same slowness on many different types of hardware or if it’s all machines with the Samsung SSD 860 EVO 500GB??

EDIT: Reading through the whole topic again I stumbled upon this:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec   11             sender
[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec                  receiver

Eleven retries for a file transfer over a period of 10 seconds seems a lot to me. So we might look at a combination of issues here.

mp12

@Sebastian-Roth @Quazz @george1421
We only have computers with the following specs:

Dell 9010 (BIOS A30)
i7-3770
16GB RAM
Samsung SSD 860 EVO 500GB

Made ten iperf3 tests in a row and got an average of 9.7 retries. Is that really so bad?

We also checked the binaries from 1.5.3 to 1.5.6. with the old image and recieved the following error:
read image_hdr device_size error

Now we are capturing a new image with the 1.5.3 binaries. The above error may have something to do with a mismatching partclone version. Hopefully the deploy speed will be at 12GB/min. If so we will try to capture an image with the current binaries.

Sebastian Roth

@mp12 said in Slowdown Unicast and Multicast after upgrading FOG Server:

Now we are capturing a new image with the 1.5.3 binaries. The above error may have something to do with a mismatching partclone version.

Yeah, definitely. Need to re-capture the image.

Thanks for the information. Good to know you have a fleet of exact same machines and I am sure we will figure out what is causing the slow speed and fix it.

george1421

@Sebastian-Roth The 11 retransmits are something but really not much. If you look at the testing I did on a Dell 790 https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast/4 For a single iperf there was no retrans but as soon as I added a second iperf running at the same time the retrans shot up in the hundreds and the throughput dropped off accordingly.

rogalskij

@mp12 Just curious if you ever figured out the source of your slow down? I too experienced the same major slowdown after upgrading to a newer dev branch to fix some issues we had (I am now on 1.5.7.102). I went from roughly 13GB per minute to 2GB or slower per minute. Very frustrating. I tried capturing the image a couple different times with different compression and a bunch of things. Tried on multiple Dell Optiplex models: 9010, 9020, 7050 and all of them display the same slowness. Got this issue on two separate servers (we have 2 campuses at our College, so two different servers). Almost feels like the different kernels did this. We were on “5.1.16 mac nvmefix” but then upgraded to the “4.19.101” which came with the 1.5.7.102 install.

I am interested in any fix for this as my desktop support team is very frustrated at the moment. I am happy to test out any theories to help this along. Would hate for others to run into this as well.

mp12

@Sebastian-Roth
We did two re-captures now. One with 1.5.3 binaries and one with current binaries from dev-branch 1.5.7.112.

We have no improvments in deploying with 1.5.7. The speed of the binaries 1.5.7.112 is around 5 GB/min.
Luckily the binaries 1.5.3 boost up to 13 GB/min. Thats the speed we had before.

For now we will stick to 1.5.3 binaries running behind FOG 1.5.7.112.
If there are improvments please let us know so we can test them.

Sebastian Roth

@rogalskij said in Slowdown Unicast and Multicast after upgrading FOG Server:

Tried on multiple Dell Optiplex models: 9010, 9020, 7050 and all of them display the same slowness.

May I ask you to open a new topic yourself and post all your hardware specs (hosts, not the FOG server) there? While I am not exactly sure yet this problem seems to be very specific to the SSD used by @mp12 and we should try to not put too much information in on topic as it leads to major confusion and failure to find and fix the issues in the end. If it turns out to be the exact same issue (which I doubt) we can still cross link the topics later on).

rogalskij

@Sebastian-Roth Absolutely. Starting new topic now. My apologies folks!

Quazz

@mp12 Would it be possible to test the binaries between 1.5.3 and 1.5.7 (so 1.5.4, 1.5.5, 1.5.6)?

This will help us track down roughly when the problem was introduced. (as there is about 2 years between 1.5.3 and current dev-branch I believe)

george1421

@Quazz In the other thread (with a similar condition) I have the OP trying the 5.5.3 one-off kernel and then he said he updated from 1.5.7 to 1.5.7.102. As part 2 of that test (assuming the kernel upgrade doesn’t fix the issue) I’m going to have him roll FOS Linux back to 1.5.7 by downloading the binaries for 1.5.7 to see if that restores the speed.

Sebastian Roth

@mp12 said in Slowdown Unicast and Multicast after upgrading FOG Server:

Luckily the binaries 1.5.3 boost up to 13 GB/min. Thats the speed we hab before.
For now we will stick to 1.5.3 binaries running behind FOG 1.5.7.112.
If there are improvments please let us know so we can test them.

Thanks for testing and updating the topic. Can you please use the 1.5.4 kernel and see if you can deploy using that. What’s it doing speed-wise then?

mp12

@Sebastian-Roth

We are running several tests at the moment. Only using the Binaries which can be downloaded from https://fogproject.org/binaries1.5.x.zip.
FOG running dev-branch 1.5.7.112

Here are some results.

Binaries 1.5.7: deploy speed around 12GB/min. bzImage-1.5.7: Linux kernel x86 boot executable bzImage, version 4.19.48 (jenkins-agent@Tollana) #1 SMP Sun Jul 14 13:08:14 CDT , RO-rootFS, swap_dev 0x7, Normal VGA

Binaries 1.5.6: deploy speed around 12GB/min. bzImage-1.5.6: Linux kernel x86 boot executable bzImage, version 4.19.36 (jenkins-agent@Tollana) #1 SMP Sun Apr 28 18:10:07 CDT , RO-rootFS, swap_dev 0x7, Normal VGA

Binaries 1.5.5: deploy speed around 12GB/min. bzImage-1.5.5: Linux kernel x86 boot executable bzImage, version 4.19.1 (sebastian@Tollana) #1 SMP Fri Feb 22 01:04:27 CST 2019, RO-rootFS, swap_dev 0x8, Normal VGA

Binaries 1.5.4: deploy speed around 12GB/min. bzImage-1.5.4: Linux kernel x86 boot executable bzImage, version 4.16.6 (builder@4c3c12e8cfd6) #4 SMP Wed May 9 22:08:36 UTC 201, RO-rootFS, swap_dev 0x7, Normal VGA

Binaries 1.5.3: deploy speed around 12GB/min. bzImage-1.5.3: Linux kernel x86 boot executable bzImage, version 4.15.2 (builder@c38bc0acaeb4) #5 SMP Tue Feb 13 18:30:08 UTC 20, RO-rootFS, swap_dev 0x7, Normal VGA

george1421

@mp12 Just for clarity you should be downloading the zip file from each release (ONLY). And using that as part of your test. The version of FOG Server should stay at 1.5.7.102 or what ever is the latest release.

The developers are suspecting something in FOS Linux (contained in bzImage and init.xz in each binary zip file) has changed somewhere at some time causing this speed issue. They need to narrow down when the speed changed between FOS Linux 1.5.x and 1.5.xn.

Also based on the data you collected so far, you can skip 1.5.4. I (we) are most interested in the 1.5.7 results.

mp12

This post is deleted!

mp12

@george1421

We are still using FOG Server dev-branch 1.5.7.112.

All binaries (1.5.3 up to 1.5.7) used the partclone version 0.2.89. Maybe thats the problem? The binaries from dev-branch where running on partclone 0.3.12.

Sebastian Roth

@mp12 Do I get this right? Whichever kernel/init you use from one of the last releases 1.5.3 through to 1.5.7 all show fast deploy speeds?

mp12

@Sebastian-Roth

That is correct.

Slowdown Unicast and Multicast after upgrading FOG Server

212

12.1k

17.3k

155.3k