file format and compression option request
oh great, Junkhacker hasn’t been around forever and now here he is requesting a bunch of stuff…
i have a few inter-related requests:
fog is still using version v0.2.89 of partclone. i’d like for it to be upgraded to a newer version, such as v0.3.11, so that a new capture argument can be added
I would like the -aX0 argument added to the partclone capture command, as either the new default or included in a new Image Manager option. this argument tells partclone to not roll a checksum into the image as it’s being captured. we use the --ignore_crc argument on restores anyway, so this should have no detrimental effects.
I would like the --rsyncable argument added as the default, or as part of a new Image Manager option, to pigz compression. this periodically resets the internal structure of the compressed data stream. this only adds approximately 1% to the size of the image. with that 1% increase in size, combined with removing the checksums, turns fog images into data that can be transfered more efficiently with rsync, or more importantly for my purposes, deduplicated.
fog images as they are created now do almost no duplication on supported filesystems and backup systems. my proposed should allow images to be dedupicated quite well. I’m working up some data to show what degree possible using the windows DDPEval tool and a number of manually converted image files for testing. so far, the results are promising and i thought i would put my ideas out there for others to review and possibly replicate.
here are the kinds of results i’ve seen with my testing so far using 2 Windows 7 images,
compressed as originally by fog with pigz -6:
Processed files size: 12.47 GB Optimized files size: 12.44 GB Space savings: 33.09 MB Space savings percent: 0 Optimized files size (no compression): 12.47 GB Space savings (no compression): 1.51 MB Space savings percent (no compression): 0
here converted to no-checksum pigz -6 --rsyncable:
Processed files size: 12.53 GB Optimized files size: 7.93 GB Space savings: 4.60 GB Space savings percent: 36 Optimized files size (no compression): 7.93 GB Space savings (no compression): 4.60 GB Space savings percent (no compression): 36
zstandard has just added an rsyncable option to their compression (not yet released, but it’s in the dev code if you build it) but it doesn’t offer as much dedup, and since it’s not in it’s actual release yet, i want to hold off on adding that code, but eventually add it as well.
*Edited for typos and formatting.
fog images as they are created now do almost no duplication on supported filesystems and backup systems.
I am wondering if this is actually the case. Reading the forum topic on this (being mentioned in the release nodes an all) I get the impression that the option was added in 2014 but not much tested. lucatrv said in Add --rsyncable option for image compression:
However, I have carried out several tests and found out that unfortunately rsync or similar deduplicating algorithms do not work as expected with Clonzilla backups. It could be due to how Clonezilla stores data […]
Have you done any testing with rsync/zbackup yet?
@Sebastian-Roth said in file format and compression option request:
i have not used zbackup because it hasn’t been in active development in years, i find it slow, and microsoft’s dedup evaluation tool (DDEval.exe) actually works pretty well for determining how dedupable a data set is.
I am wondering if this is actually the case
i posted my benchmark results…
those posts are about adding the compression with --rsyncable to clonezilla, but they mention that they aren’t getting the results they expect and conclude “I think it could be due to how Clonezilla stores data.” well, they’re right. partclone has a rolling checksum included in the file that kills dedup. since then, they have added the ability to save the image without a checksum. i don’t think anyone has tested it’s dedup potential until now.
see results of the same 2 Windows 7 images from the benchmarks in my original post.
partclone images with checksum - uncompressed (originally captured by fog, just decompressed):
Evaluated folder size: 29.70 GB Files in evaluated folder: 16 Processed files: 6 Processed files size: 29.70 GB Optimized files size: 15.16 GB Space savings: 14.54 GB Space savings percent: 48 Optimized files size (no compression): 29.70 GB Space savings (no compression): 4.13 MB Space savings percent (no compression): 0 Files excluded by policy: 10 Small files (<32KB): 10 Files excluded by error: 0
notice that all of the potential space savings reported by the tool are in compression
partclone images without checksum - uncompressed:
Evaluated folder size: 29.66 GB Files in evaluated folder: 16 Processed files: 6 Processed files size: 29.66 GB Optimized files size: 8.64 GB Space savings: 21.02 GB Space savings percent: 70 Optimized files size (no compression): 17.39 GB Space savings (no compression): 12.27 GB Space savings percent (no compression): 41 Files excluded by policy: 10 Small files (<32KB): 10 Files excluded by error: 0
this offers the best possible dedup of the ways of storing images for fog, but imaging would take a long time without compression, so…
partclone images without checksum pigz -6 --rsyncable:
Evaluated folder size: 12.53 GB Files in evaluated folder: 16 Processed files: 6 Processed files size: 12.53 GB Optimized files size: 7.93 GB Space savings: 4.60 GB Space savings percent: 36 Optimized files size (no compression): 7.93 GB Space savings (no compression): 4.60 GB Space savings percent (no compression): 36 Files excluded by policy: 10 Small files (<32KB): 10 Files excluded by error: 0
i have more benchmark results, but this post is already getting long.
We might start testing with the unstable build before we actually let users run into it. I just build a fresh init with partclone 0.3.12 in it. No adjustments to the parameters yet. I just want to see if plain old capture and deploy run smoothly with unstable partclone. Please give it a go and let us know.
@Sebastian-Roth huh, i didn’t even realize it was on an “unstable” build. Tsai doesn’t differentiate on his github https://github.com/Thomas-Tsai/partclone/. It has a “release” tag.
i can confirm that the images i have created with 0.3.12 with the checksum removed still deploy with the 0.2.89 build, so at least there’s not much of a backwards compatibility concern, for what it’s worth.
i’ll try to do some testing with your init build when I get back to work.
i’ve had a chance to do just a little testing. something definitely isn’t working right. images i had converted using partclone 3.11 deploy fine with the standard init, but using your build with 3.12 i can’t get a successful deploy of existing images (boot to blinking cursor), and can’t deploy what i capture with it (it flashes fast but looks like a crc error when you would normally get a “syncing” message from partclone, even though we are ignoring checksums)
@Junkhacker interesting. no problems with images that are told not to generate checksums. no problem with images with checksums if i remove the --ignore_crc parameter.
@Junkhacker further testing suggests that everything works fine without the --ignore_crc flag we have set in all restore operations. i’ve also learned that due to a bug in partclone 2.89, the checksums it was making were practically useless. if/when we upgrade to the newer version of partclone, we might want to consider disabled checksums to be the default, since we’ve been, in essence, operating without checksums all along.