Slowdown Unicast and Multicast after upgrading FOG Server
FOG-Server: VM with 6 cores and 8GB RAM
Ubuntu 16.04, FOG dev-branch 126.96.36.199 (before: 16.04, FOG dev-branch 1.5.3.?)
We are having a huge slowdown in deploying our images. Here is a comparison of Unicast and Multicast:
Unicast FOG 1.5.3: 21 minutes 20 seconds
Unicast FOG 1.5.7: 1 hour 7 minutes 40 seconds
Multicast FOG 1.5.3: 22 minutes 36 seconds
Multicast FOG 1.5.7: 3 hours 53 minutes 31 seconds
Image Size on Client around 250GB.
Compression Format: Partclone Zstd.
We just startet a new capture to see if the problem still exists.
Otherwise we have no clue what happened.
Anyone else having this problem?
though they include a specific Read Write buffer of -z 10485760 (which is 10 times the default). That said, the default didn’t cause issues before, so it’s unlikely to cause issues now
It would be really interesting to know why they picked such a large write buffer. I wonder what problem were they trying to solve?? Or was this a hold over from a previous release of clonezilla using 0.2.98.
@Sebastian-Roth Can confirm that after testing this in my environment with the new init (partclone 0.3.13) and the latest dev Kernel specified, things are back to fast again. Average in my environment was somewhere around 9/GB per minute.
@mp12 Ok, here we go. Please try this init: https://fogproject.org/inits/init_partclone_0.3.13.xz
Make sure you use the kernel delivered with the latest FOG
dev-branchversion. If you are unsure manually re-download here: https://fogproject.org/kernels/Kernel.TomElliott.188.8.131.52
If the speed is high then we have ruled out the kernel (from dev-branch) and surely the partclone version 0.3.12 would cause the slowdown in your case. If it’s still slower than expected I would ask you to stick to the init_partclone_0.3.13.xz but go back to kernel from binaries1.5.7.zip. Slow or fast?
My apologies if this is too forward, but would the latest build with 0.3.13 be able to be installed by myself as well?
Sure you can but it’s way more complicated to explain right now then just build it for you. If it turns out to be the issue we might need to go ahead to 0.3.13 for the official binaries anyway.
@rogalskij If you have an immediate need, you can install the 1.5.7 version of FOS (not FOG) to get the speed back today. Just grab the 1.5.7 binaries and extract the bzImage* and init*.xz files and drop them into /var/www/html/fog/service/ipxe directory and the pxe boot the target computer. Those binaries will run fine on FOG 184.108.40.206 or later. You can do that until the devs can get things sorted out. Just be aware that if you captured an image using 220.127.116.11 you can not deploy it with FOS 1.5.7.
@Sebastian-Roth Also if this doesn’t lead us to a solution, we could hack the inits and then “borrow” clonezilla’s partclone to see if there is any change in performance.
I have a 9020 here, let me dig it out and see if I can duplicate the results. Only to confirm I can create a broken system.
@Sebastian-Roth My apologies if this is too forward, but would the latest build with 0.3.13 be able to be installed by myself as well? I would love to test 0.3.13 to see if it fixes my slowness issue as well. I would gladly report back my findings.
@Quazz Yeah, good summary on what we are at right now.
@mp12 We shall compile the latest inits with partclone 0.3.13 (we have 0.3.12 currently) for you to test I reckon. I think I should be able to quickly do this when I get home. Will be back soon.
@mp12 Perhaps, though I believe the same issue does not occur in Clonezilla which also uses partclone 0.3.
Their partclone commands are relatively similar to ours, though they include a specific Read Write buffer of
-z 10485760(which is 10 times the default). That said, the default didn’t cause issues before, so it’s unlikely to cause issues now.
A greated divergence between FOS and Clonezilla is that Clonezilla uses debian as a base for their live ISO, whereas we build a filesystem using Buildroot and kernel from source.
So more likely there is a problem introduced in that area, whether bug, config issues, or otherwise.
That all said, thank you for helping us narrow it down significantly already.
That is correct.
@mp12 Do I get this right? Whichever kernel/init you use from one of the last releases 1.5.3 through to 1.5.7 all show fast deploy speeds?
We are still using FOG Server dev-branch 18.104.22.168.
All binaries (1.5.3 up to 1.5.7) used the partclone version 0.2.89. Maybe thats the problem? The binaries from dev-branch where running on partclone 0.3.12.
This post is deleted!
@mp12 Just for clarity you should be downloading the zip file from each release (ONLY). And using that as part of your test. The version of FOG Server should stay at 22.214.171.124 or what ever is the latest release.
The developers are suspecting something in FOS Linux (contained in bzImage and init.xz in each binary zip file) has changed somewhere at some time causing this speed issue. They need to narrow down when the speed changed between FOS Linux 1.5.x and 1.5.xn.
Also based on the data you collected so far, you can skip 1.5.4. I (we) are most interested in the 1.5.7 results.
We are running several tests at the moment. Only using the Binaries which can be downloaded from https://fogproject.org/binaries1.5.x.zip.
Here are some results.
Binaries 1.5.7: deploy speed around 12GB/min.
bzImage-1.5.7: Linux kernel x86 boot executable bzImage, version 4.19.48 (jenkins-agent@Tollana) #1 SMP Sun Jul 14 13:08:14 CDT , RO-rootFS, swap_dev 0x7, Normal VGA
Binaries 1.5.6: deploy speed around 12GB/min.
bzImage-1.5.6: Linux kernel x86 boot executable bzImage, version 4.19.36 (jenkins-agent@Tollana) #1 SMP Sun Apr 28 18:10:07 CDT , RO-rootFS, swap_dev 0x7, Normal VGA
Binaries 1.5.5: deploy speed around 12GB/min.
bzImage-1.5.5: Linux kernel x86 boot executable bzImage, version 4.19.1 (sebastian@Tollana) #1 SMP Fri Feb 22 01:04:27 CST 2019, RO-rootFS, swap_dev 0x8, Normal VGA
Binaries 1.5.4: deploy speed around 12GB/min.
bzImage-1.5.4: Linux kernel x86 boot executable bzImage, version 4.16.6 (builder@4c3c12e8cfd6) #4 SMP Wed May 9 22:08:36 UTC 201, RO-rootFS, swap_dev 0x7, Normal VGA
Binaries 1.5.3: deploy speed around 12GB/min.
bzImage-1.5.3: Linux kernel x86 boot executable bzImage, version 4.15.2 (builder@c38bc0acaeb4) #5 SMP Tue Feb 13 18:30:08 UTC 20, RO-rootFS, swap_dev 0x7, Normal VGA
Luckily the binaries 1.5.3 boost up to 13 GB/min. Thats the speed we hab before.
For now we will stick to 1.5.3 binaries running behind FOG 126.96.36.199.
If there are improvments please let us know so we can test them.
Thanks for testing and updating the topic. Can you please use the 1.5.4 kernel and see if you can deploy using that. What’s it doing speed-wise then?
@Quazz In the other thread (with a similar condition) I have the OP trying the 5.5.3 one-off kernel and then he said he updated from 1.5.7 to 188.8.131.52. As part 2 of that test (assuming the kernel upgrade doesn’t fix the issue) I’m going to have him roll FOS Linux back to 1.5.7 by downloading the binaries for 1.5.7 to see if that restores the speed.
@mp12 Would it be possible to test the binaries between 1.5.3 and 1.5.7 (so 1.5.4, 1.5.5, 1.5.6)?
This will help us track down roughly when the problem was introduced. (as there is about 2 years between 1.5.3 and current dev-branch I believe)
@Sebastian-Roth Absolutely. Starting new topic now. My apologies folks!
Tried on multiple Dell Optiplex models: 9010, 9020, 7050 and all of them display the same slowness.
May I ask you to open a new topic yourself and post all your hardware specs (hosts, not the FOG server) there? While I am not exactly sure yet this problem seems to be very specific to the SSD used by @mp12 and we should try to not put too much information in on topic as it leads to major confusion and failure to find and fix the issues in the end. If it turns out to be the exact same issue (which I doubt) we can still cross link the topics later on).