Slowdown Unicast and Multicast after upgrading FOG Server
-
@Sebastian-Roth @Quazz @george1421
We only have computers with the following specs:- Dell 9010 (BIOS A30)
- i7-3770
- 16GB RAM
- Samsung SSD 860 EVO 500GB
Made ten iperf3 tests in a row and got an average of 9.7 retries. Is that really so bad?
We also checked the binaries from 1.5.3 to 1.5.6. with the old image and recieved the following error:
read image_hdr device_size error
Now we are capturing a new image with the 1.5.3 binaries. The above error may have something to do with a mismatching partclone version. Hopefully the deploy speed will be at 12GB/min. If so we will try to capture an image with the current binaries.
-
@mp12 said in Slowdown Unicast and Multicast after upgrading FOG Server:
Now we are capturing a new image with the 1.5.3 binaries. The above error may have something to do with a mismatching partclone version.
Yeah, definitely. Need to re-capture the image.
Thanks for the information. Good to know you have a fleet of exact same machines and I am sure we will figure out what is causing the slow speed and fix it.
-
@Sebastian-Roth The 11 retransmits are something but really not much. If you look at the testing I did on a Dell 790 https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast/4 For a single iperf there was no retrans but as soon as I added a second iperf running at the same time the retrans shot up in the hundreds and the throughput dropped off accordingly.
-
@mp12 Just curious if you ever figured out the source of your slow down? I too experienced the same major slowdown after upgrading to a newer dev branch to fix some issues we had (I am now on 1.5.7.102). I went from roughly 13GB per minute to 2GB or slower per minute. Very frustrating. I tried capturing the image a couple different times with different compression and a bunch of things. Tried on multiple Dell Optiplex models: 9010, 9020, 7050 and all of them display the same slowness. Got this issue on two separate servers (we have 2 campuses at our College, so two different servers). Almost feels like the different kernels did this. We were on “5.1.16 mac nvmefix” but then upgraded to the “4.19.101” which came with the 1.5.7.102 install.
I am interested in any fix for this as my desktop support team is very frustrated at the moment. I am happy to test out any theories to help this along. Would hate for others to run into this as well.
-
@Sebastian-Roth
We did two re-captures now. One with 1.5.3 binaries and one with current binaries from dev-branch 1.5.7.112.We have no improvments in deploying with 1.5.7. The speed of the binaries 1.5.7.112 is around 5 GB/min.
Luckily the binaries 1.5.3 boost up to 13 GB/min. Thats the speed we had before.For now we will stick to 1.5.3 binaries running behind FOG 1.5.7.112.
If there are improvments please let us know so we can test them. -
@rogalskij said in Slowdown Unicast and Multicast after upgrading FOG Server:
Tried on multiple Dell Optiplex models: 9010, 9020, 7050 and all of them display the same slowness.
May I ask you to open a new topic yourself and post all your hardware specs (hosts, not the FOG server) there? While I am not exactly sure yet this problem seems to be very specific to the SSD used by @mp12 and we should try to not put too much information in on topic as it leads to major confusion and failure to find and fix the issues in the end. If it turns out to be the exact same issue (which I doubt) we can still cross link the topics later on).
-
@Sebastian-Roth Absolutely. Starting new topic now. My apologies folks!
-
@mp12 Would it be possible to test the binaries between 1.5.3 and 1.5.7 (so 1.5.4, 1.5.5, 1.5.6)?
This will help us track down roughly when the problem was introduced. (as there is about 2 years between 1.5.3 and current dev-branch I believe)
-
@Quazz In the other thread (with a similar condition) I have the OP trying the 5.5.3 one-off kernel and then he said he updated from 1.5.7 to 1.5.7.102. As part 2 of that test (assuming the kernel upgrade doesn’t fix the issue) I’m going to have him roll FOS Linux back to 1.5.7 by downloading the binaries for 1.5.7 to see if that restores the speed.
-
@mp12 said in Slowdown Unicast and Multicast after upgrading FOG Server:
Luckily the binaries 1.5.3 boost up to 13 GB/min. Thats the speed we hab before.
For now we will stick to 1.5.3 binaries running behind FOG 1.5.7.112.
If there are improvments please let us know so we can test them.Thanks for testing and updating the topic. Can you please use the 1.5.4 kernel and see if you can deploy using that. What’s it doing speed-wise then?
-
We are running several tests at the moment. Only using the Binaries which can be downloaded from https://fogproject.org/binaries1.5.x.zip.
FOG runningdev-branch 1.5.7.112
Here are some results.
Binaries 1.5.7: deploy speed around 12GB/min.
bzImage-1.5.7: Linux kernel x86 boot executable bzImage, version 4.19.48 (jenkins-agent@Tollana) #1 SMP Sun Jul 14 13:08:14 CDT , RO-rootFS, swap_dev 0x7, Normal VGA
Binaries 1.5.6: deploy speed around 12GB/min.
bzImage-1.5.6: Linux kernel x86 boot executable bzImage, version 4.19.36 (jenkins-agent@Tollana) #1 SMP Sun Apr 28 18:10:07 CDT , RO-rootFS, swap_dev 0x7, Normal VGA
Binaries 1.5.5: deploy speed around 12GB/min.
bzImage-1.5.5: Linux kernel x86 boot executable bzImage, version 4.19.1 (sebastian@Tollana) #1 SMP Fri Feb 22 01:04:27 CST 2019, RO-rootFS, swap_dev 0x8, Normal VGA
Binaries 1.5.4: deploy speed around 12GB/min.
bzImage-1.5.4: Linux kernel x86 boot executable bzImage, version 4.16.6 (builder@4c3c12e8cfd6) #4 SMP Wed May 9 22:08:36 UTC 201, RO-rootFS, swap_dev 0x7, Normal VGA
Binaries 1.5.3: deploy speed around 12GB/min.
bzImage-1.5.3: Linux kernel x86 boot executable bzImage, version 4.15.2 (builder@c38bc0acaeb4) #5 SMP Tue Feb 13 18:30:08 UTC 20, RO-rootFS, swap_dev 0x7, Normal VGA
-
@mp12 Just for clarity you should be downloading the zip file from each release (ONLY). And using that as part of your test. The version of FOG Server should stay at 1.5.7.102 or what ever is the latest release.
The developers are suspecting something in FOS Linux (contained in bzImage and init.xz in each binary zip file) has changed somewhere at some time causing this speed issue. They need to narrow down when the speed changed between FOS Linux 1.5.x and 1.5.xn.
Also based on the data you collected so far, you can skip 1.5.4. I (we) are most interested in the 1.5.7 results.
-
This post is deleted! -
We are still using FOG Server dev-branch 1.5.7.112.
All binaries (1.5.3 up to 1.5.7) used the partclone version 0.2.89. Maybe thats the problem? The binaries from dev-branch where running on partclone 0.3.12.
-
@mp12 Do I get this right? Whichever kernel/init you use from one of the last releases 1.5.3 through to 1.5.7 all show fast deploy speeds?
-
That is correct.
-
@mp12 Perhaps, though I believe the same issue does not occur in Clonezilla which also uses partclone 0.3.
Their partclone commands are relatively similar to ours, though they include a specific Read Write buffer of
-z 10485760
(which is 10 times the default). That said, the default didn’t cause issues before, so it’s unlikely to cause issues now.A greated divergence between FOS and Clonezilla is that Clonezilla uses debian as a base for their live ISO, whereas we build a filesystem using Buildroot and kernel from source.
So more likely there is a problem introduced in that area, whether bug, config issues, or otherwise.
That all said, thank you for helping us narrow it down significantly already.
-
-
@Sebastian-Roth My apologies if this is too forward, but would the latest build with 0.3.13 be able to be installed by myself as well? I would love to test 0.3.13 to see if it fixes my slowness issue as well. I would gladly report back my findings.
-
@Sebastian-Roth Also if this doesn’t lead us to a solution, we could hack the inits and then “borrow” clonezilla’s partclone to see if there is any change in performance.
I have a 9020 here, let me dig it out and see if I can duplicate the results. Only to confirm I can create a broken system.