ZSTD Compression
-
I am trying to setup something to test speed with local disks and a ramdisk…
I have a copy of pigz for windows and the zstd version of 7zip…
I’m setting up a quad core VM on one of my hosts with over 32GB RAM. CPU in the host is dual E5-2603V3 and storage is HDD Raid 1.
Which compression levels in pigz do you want me to try and test?
I am not entirely sure this gets round all of the problems speed wise which i could have that would make results not ideal… but it’s the best i can do.
-
@VincentJ default compression level is 6. testing found that it had the best compression/performance ratio for us. i don’t think many people change it.
-
@Junkhacker I have never had to adjust the compression. 3-5 minutes to image a client machine is well within my acceptance.
-
@Jaymes-Driver Same here. 3-5 minutes is more than sufficient considering that MDT and WDS take much longer to produce a finished product. No need to change it from the defaults unless you are dealing with under powered target systems.
-
for what it’s worth, i’m testing a few things with this new compression method. i’ll share my results.
-
I have more results in progress on the RAMdisk. Hopefully i’ll get them done tonight.
-
@VincentJ From what I’ve read this compression algorithm is targetted specifically at modern CPUs, leveragering their instructions sets, which older CPUs will lack.
Meaning that tests will be needed on older hardware, as there will be people using it for quite some time. If there’s a huge time penalty there, it will still be a no go for a lot of people.
-
ok, here are my pseudo scientific results:
pigz vs pzstd (parallel implementation of zstandard, experimental)
tests were performed using a windows 7 image
(larger of 2 partitions only)
uncompressed image file size:34650439624(de)compression tests were performed as closely as i could to emulate fog’s operation methods (without doing too much work :P). files were cat-ed from a mounted nfs share and piped into the programs, with results saved to an SSD. test machine was running Lubuntu instead of a custom FOS init, because i’m lazy.
pigz -6 compression
duration: 6:06
file size: 17548659028pigz -6 decompression
duration: 6:00pzstd default (3?) compression
duration: 5:16
file size: 16967988207pzstd decompression default compression file
duration: 3:17pzstd -6 compression
duration: 6:11
file size: 16247155611pzstd decompression -6 compression file
duration: 3:16pzstd -9 compression
duration: 10:00
file size: 16084180231pzstd decompression -9 compression file
duration: 3:21Edited to add zst compression level 6
-
@Junkhacker Interesting, what kind of CPU did the test device have?
-
@Quazz it’s an Optiplex 3020. i5
it may be worth pointing out that the pigz performance from my tests is not really representative of what i typically see when actually fog imaging this machine. testing was faster at compression by a significant amount, and a bit slower at decompression, than my experience. not sure what that says about the usefulness of these tests.
-
SO…
16,390,624KB file removed from the compressed windows 10 image.
on a 34GB RAMdisk.Copying the file on the RAMdisk is 740MB/s so that is well above what we need for imaging for most people.
Lets try some things to get some numbers.
Compression - Compressed size - Compression time - Decompression time
zstd lvl1 - 7,940,779KB - 50 seconds - 38 seconds
zstd lvl3 - 7,420,268KB - 75 seconds - 40 seconds
zstd lvl5 - 7,286,951KB - 128 seconds - 40 seconds
zstd lvl8 - 7,070,670KB - 261 seconds - 41 seconds
zstd lvl11 - 6,967,155KB - 425 seconds - 41 seconds
zstd lvl14 - 6,942,360KB - 674 seconds - 42 seconds
zstd lvl17 - 6,781,375KB - 1,618 seconds - 42 seconds
zstd lvl20 - 6,471,945KB - 2,416 seconds - 43 seconds
zstd lvl22 - 6,214,702KB - 3,970 seconds - 45 secondspigz.exe --keep -0 a:\d1p2 - 16,393,125KB - 72 seconds - 80 seconds
pigz.exe --keep -3 a:\d1p2 - 7,783,303KB - 292 seconds - 158 seconds (157 seconds)
pigz.exe --keep -6 a:\d1p2 - 7,535,149KB - 518 seconds - 149 seconds
pigz.exe --keep -9 a:\d1p2 - 7,512,046KB - 1,370 seconds - 149 secondsWindows 10 Pro, 4 vCPU 42GB RAM with 34GB RAM Disk.
Host XenServer 7.0, Dual E5-2603 v3, 64GB RAM, HDD Raid 1.
Other VMs moved to the other hosts in the pool.Decompression seems to not use all CPU with PIGZ… around 50%…
Compression does use all 100% CPU
Decompression with zstd does use all CPU - but most were around 400MB/s so possibly I’m hitting some other limit. -
So, both of us have results that show zstd decompressing quicker and having better ratios.
I’m going to reconfigure my VM to only have 1 vCPU at 1.6GHz to see if i can get more useful decompression results.
I redid the pigz -3 decompression test twice to confirm it was slower than the others… Not what i was expecting but that is what happened.
In my tests for compression the standard -6 on PIGZ is beaten by zstd for ratio by zstd lvl3 and completes 443 seconds faster… We could up to zsd lvl11 and have it 93 seconds quicker and save around 550MB.
-
i’ve been talking with @Tom-Elliott about this, and we don’t think it would be worth the effort it would take to implement zstandard. the thing is, faster decompression is kind of irrelevant for FOG at the moment. what slows down deployments at the moment is transfer speed. the only way fog would get faster is if the file size was very significantly decreased. while the compression ratio is a better with zstandard, the difference isn’t very significant until you get to the higher compression levels, where processing time becomes a big issue.
there are other issues that deter us from adoption, but that’s the most significant reason. in fact, the single greatest reason TO adopt it would be because i think it’s really cool, lol.
-
@Junkhacker said in ZSTD Compression:
there are other issues that deter us from adoption, but that’s the most significant reason. in fact, the single greatest reason TO adopt it would be because i think it’s really cool, lol.
It would be REALLY COOL
-
THe problem isn’t the implementation or not.
Already, with PIGZ in use the issue (beyond multiple Unicast tasks) is most often slow down in writing the information to the disk. This is especially present when one is dealing with SSD.
It’s great that you can have “fast” decompression, but that only goes so far. You still have to write the data to disk. You have some buffer, but we’re already “decompressing” the data as fast as we can.
Where this might be very useful, however, would be uncompressed images, compressed as the data is requested, and then placed on disk so we have a live element of diminishing the amount of data to be passed across the network. Once it’s passed to the client, the only “hold” is on the speed at which data can be pushed from ram and written to disk. Even this, however, can only do so much.
Is it really worth implementing a new compression mechanism to maybe get a speed increase of possibly 1% during our imaging process?
-
I understand the speed would be significantly increased on upload tasks, but I don’t know how often people are uploading.
-
1 vCPU 1.6GHz - the system can no longer saturate gigabit over network shares…
Down from 110MB/s to 82MB/sCompression - Compressed size - Decompression time
zstd lvl1 - 7,940,779KB - 131 seconds
zstd lvl3 - 7,420,268KB - 134 seconds
zstd lvl11 - 6,967,155KB - 139 seconds
zstd lvl22 - 6,214,702KB - 157 secondspigz.exe --keep -6 a:\d1p2 - 7,535,149KB - 247 seconds
On my quad core VM PIGZ -6 only used 50MB/s decompression, zstd level 11 with a single core VM uses the same 50MB/s…
On the single core VM, PIGZ -6 is only 30 MB/s, the lowest zstd gets on level 22 is 39.5MB/sif we use the single core numbers, writing the whole image in 247 seconds (which isn’t too much faster than expected anyway) is around 66MB/s on disk, using zstd 11 writing it in 139 seconds is 117MB/s Most SATA disks should be able to do this… It will be a push for some 2.5" disks… (I checked numbers for 2.5" and 3.5" WD Greens)
-
Note that, since v1.1.3, there is a multithread mode available with
zstd
.It needs to be compiled with specific flags though.
On linux, it means typingmake zstdmt
For Windows, there are pre-compiled binaries in the release section : use thezstdmt
one.Since
pigz
is multi-threaded, it would be more fair to compare tozstdmt
, rather than single-threadedzstd
.The number of threads can be selected with command
-T#
, likexz
. -
The version of zstd i’ve been using is using all my threads
-
Maybe you just saw the note about 1vCPU. I only reduced to 1vCPU as the numbers with 4vCPU were all so close together.
Also might help to simulate a ‘low end’ machine…