ZSTD Compression


  • Moderator

    @Tom-Elliott said in ZSTD Compression:

    Alright, so I got bored.

    I added all the “capabilities” as requested.

    What I have found, so far, is the zstd doesn’t appear any better in compression even on -19 but with a SIGNIFICANT amount of overhead to wait. My image with pigz -6 was 2.8GB with 10 minute capture time. Same image deployed, and captured under zstd -19 was 2.6GB with 35 minute capture time. This was all done in NATIVE capture/deploy to ensure the inits were capable.

    Mind you my test system was 1 CPU so multiple CPU’s may have helped in the capture time, but was it worth it? I mean, To be usable to multiple systems, a compression bearing of at least half the size would make it a suitable alternative. 200 MB not worth it.

    This was kind of my fear with ZSTD, it seems very slow on older CPUs (and even more so on single core ones)


  • Senior Developer

    @Wayne-Workman When I was referring to ZSTD Compression and not being worth it, I’m not thinking other “uses” of zstd. I’m aware the larger the disk space and usage of it could be greatly enhanced. I’m not thinking in terms of “space” itself though. I’m thinking in whole.

    For example, if a 2.0 gb image causes issues distributing to multiple items at the same time, a 1.0 gb image would make the same number of multiple item much faster as the data across the network would be greatly minimized. Even with smaller sizes this should old true though as you’re saving that amount of data. 200MB to one host may not be the best, but 200MB to 10 hosts would save nearly 2 gb of data being passed across the network. However, the time it takes to deploy to each of those hosts has many considerations to be thought of (but that would remain the case no matter the size of the file).



  • @Wayne-Workman We actually have about 1.5 TB worth of FOG images.

    We have a “Standard Lab” image, but we have numerous labs with custom software or just custom configurations. Then, we also have a “Standard Employee” image, too.

    But honestly, space still isn’t the biggest consideration for us; storage is cheap, but time is not. It sounds like the time savings may not be worth it, or there may actually be a loss of time?


  • Senior Developer

    So I realized that while I had built and the zstd compression binary is working, the pzstd library was more of use as it allowed realtime and multiple core compression.

    I am having a problem building pzstd for 32 bit systems though :(.



  • @Tom-Elliott said in ZSTD Compression:

    I mean, To be usable to multiple systems, a compression bearing of at least half the size would make it a suitable alternative. 200 MB not worth it.

    It probably doesn’t matter so much to us, most people don’t have over 1TB of images.

    But that little difference to a company like Facebook or Google or youtube - well they might be bucking against the chains to get it implimented because it may save them 100PB of space (Taking into account very redundant storage plus backups plus mirror sites).


  • Senior Developer

    imgFormat is how we determined Partclone VS. Partimage
    from upgrading 0.32 -> current, image definitions are automatically associated with imgFormat of 1 (which equals partimage)
    All I’ve done is extend the capability of this field
    so the way it works now:
    0 = partclone gzipped
    1 = partimage (gzipped of course)
    2 = partclone gzipped split 200MiB
    3 = partclone uncompressed
    4 = partclone uncompressed split 200mib
    5 = zstd compressed
    6 = zstd compressed split 200mib


  • Moderator

    We already have the check box for ‘legacy images’ which the admin can use… No reason that the check box on the image couldn’t say which compression method it’s using.


  • Moderator

    @Tom-Elliott While I don’t see this function as a “need” for FOG. The speed of image deployment is based on a large number of factors, image extraction is one element. A lot depends on your goals, to deploy fast, or consume less space on the FOG server for more images. But, having the option I guess is never totally bad either.

    The one issue I can see is: whoever enables this function, they will have to do this in the beginning of the FOG install or they will have a mixed set of images on their fog server. This WILL cause issues unless the code is smart enough to know this is a pigz image and that is a zstd image (and just when we gained the option for compatibility with clonezilla images too). You’ll also have to consider the implications of when this image is replicated to other storage nodes or exported to other FOG environments. I’m not saying it’s a good or a bad thing. Its just something that the FOG IT technician is going to have to be aware of.


  • Senior Developer

    I guess what I’m trying to say is, ZSTD is now incorporated into the inits and is “natively” integrated. You can chose to use it how you see fit.


  • Senior Developer

    @VincentJ It’s a Windows XP image. I use it as it’s a single partition so less to worry about and wait for when running tests in general.

    Because of this, the “data” size is 5.8GB and it’s compressing this size further to 2.6 when pigz was 2.8. hopefully that helps give some details.


  • Moderator

    What is in that image? 2.6GB compressed is very small. Does that image download in under a minute normally?

    I have a base windows 10 + updates image i can also try. The one i used in my numbers previously had applications in it for a complete system. I will see if i can get that to compress down to something similar.

    While my image is a lot bigger if i scale yours up to the size of mine; i am saving a lot more space.


  • Senior Developer

    Alright, so I got bored.

    I added all the “capabilities” as requested.

    What I have found, so far, is the zstd doesn’t appear any better in compression even on -19 but with a SIGNIFICANT amount of overhead to wait. My image with pigz -6 was 2.8GB with 10 minute capture time. Same image deployed, and captured under zstd -19 was 2.6GB with 35 minute capture time. This was all done in NATIVE capture/deploy to ensure the inits were capable.

    Mind you my test system was 1 CPU so multiple CPU’s may have helped in the capture time, but was it worth it? I mean, To be usable to multiple systems, a compression bearing of at least half the size would make it a suitable alternative. 200 MB not worth it.



  • @VincentJ said in ZSTD Compression:

    Do you know if most people use multicast or just do multiple unicast for deployments?

    It’s a mix.


  • Moderator

    @Tom-Elliott Thanks for putting it into the init.

    Would it be as simple as searching through the code for the commands for imaging and changing them to use zstd instead of pigz or would there be more complicated things involved due to the way the commands are generated?

    Do you know if most people use multicast or just do multiple unicast for deployments? I have never got multicast to work fully and always end up with each client downloading on it’s own. I have usually had my server set to 4 clients at once except when i had 10GbE and 2Gbit links between MDF and IDF… On that machine i used 8 and with ZFS caching I had no problems with the disk IO of so many transfers.

    If we can get improvements via increasing those numbers then it makes things a bit more worth the effort to speed up people’s deployments.

    as for uploading… I also have to upload every month or so and with one of my clients i have a 2 hour time window to do all maintenance so uploading sometimes gets delayed as it can take a considerable amount of time.

    The other benefit of reduced file size would also help, in my case, by reducing the sync time between sites over WAN.

    As people’s machines become more powerful then we can scale with them instead of being held back by the lack of speed in PIGZ. 10GbE is coming down in price and SSD/NVMe/HDD are getting better all the time.



  • @Tom-Elliott said in ZSTD Compression:

    Already, with PIGZ in use the issue (beyond multiple Unicast tasks) is most often slow down in writing the information to the disk.

    As I was reading through this thread, this is exactly what I thought- that the biggest benefit would come with multiple simultaneous unicast deployments. Maybe instead of having Max Clients set at 2 I could do 3.

    And who knows, maybe I’ll squish the images enough to store 1 extra.


  • Senior Developer

    So for what it’s worth I’m giving a shot, I have not coding anything to use zstd, but I am running an installation/build test that will hopefully build the init’s with the necessary zstd binaries so others can test internally.


  • Senior Developer

    So I think what I want to say.

    Seeing as this ZSTD, in what I can see here, only impacts upload speeds, is it worth the effort for a new standard and methodology of software to support when pigz/gzip is pretty much well standardized?

    Consider this:

    While capturing could be significantly improved, the deploy (which i imagine happens far more often that capture tasks) would not see a significant boost. Now if you have 10 unicast tasks with ZSTD that are able to deploy much more reliably and faster, this would be an improvement worth considering.

    So if you all want to try this, build your init’s using the Wiki instructions and the information from the buildroot source already provided in every installation of FOG and run tests. Right now, as I’m seeing it, implementing this has been focused solely on compression after the image has been captured previously. Has anybody actually “compressed” the image during a real “capture” task?

    Things to work with:

    1. integration into the init’s as a real utility for us to use.
    2. Do the same results happen on capture (maybe I missed this part).
    3. Do multiple unicast deploy’s deploy faster using this mechanism?


  • @Tom-Elliott Well, when it comes to uploading from HyperV with a legacy adapter to FOG, upload time is actually what takes most of the time. Image creation takes little time in comparison.

    But yes, uploading from a physical machine is quite fast.


  • Senior Developer

    @loosus456 Let’s say you upload 2 image’s a month, and you deploy 400 times a month, ultimately while upload would be “faster” you’re only increasing it during the upload process. As you still have your “setup” to create the image which is what’s taking the most of your time.



  • @Tom-Elliott We do upload often (about twice a month), but if the upload isn’t much, much faster and the deployment isn’t significantly faster, it probably isn’t worth it.

    I do wonder if HyperV upload through the legacy adapter would be faster, though. That takes literal hours right now.


Log in to reply
 

369
Online

6.1k
Users

13.4k
Topics

126.4k
Posts