Dedupe storage - how to best bypass pigz packaging?

george1421

@Tom-Elliott Tom, as a test, would it be possible to just use clonezilla to capture a disk image and store it on the target storage array? What I’m interested in is it even possible to dedup a huge binary file? Since both clonezilla and FOG use partclone to clone the image would both programs provide a similar file in structure? If it would, it would give the OP a way to test without hacking too much with the fog init scripts.

Tom Elliott

@george1421 I’m working a prototype init that will enable use without compression.

Tom Elliott

Prototype is up and appears to be working properly. With this new change it should theoretically be possible to use clonezilla images within fog, once files are named into fog formats. Same goes in reverse, fog images with uncompressed format should be able move into clonezilla provided the images are renamed to clonezilla naming standards.

george1421

@Tom-Elliott Just so I’m clear, this feature is only available with the latest release of FOG (1.3.5.rc5)?

HROAdmin26

@Tom-Elliott Thanks Tom! I will test out the funcs.sh change to see how the upload results change.

@george1421 Yes, a clonezilla image directly uploaded will dedupe (with no compression.) The file is chunked into pieces and deduped based on those chunks. The art is to match the dedupe chunk size with the data inside the image and to match the chunk boundaries between the algorithm and the incoming data.

Tom Elliott

@george1421 Correct. Really the working branch, it will be available for re-installs of rc5 but not in the GUI.

Tom Elliott

1.3.5 RC 6 has been released and should have this ‘uncompressed’ capability coded more properly for it.

Jaymes Driver

@Tom-Elliott Excitedly downloads new RC

Junkhacker

sorry i didn’t see this thread earlier, but i have experimented with dedup of uncompressed fog images on a zfs filesystem to see if it was worth it. i saw far less gains than when using compression. but i really can’t say i’m highly experienced in dedup, so maybe i did something wrong. let us know how your experiments go

george1421

@Junkhacker I to am interested in seeing how the dedup rate compares to the same file compressed at a level 6. If for just information only. It would be interesting to know.

Tom Elliott

I think de-duplication from multiple images (like 10 images) would be much more suitable than the same 10 images with compression. But again, this means it will be seen when you are dealing with multiple images. For one or two images it’s probably not going to be worth the gain.

Junkhacker

@Tom-Elliott it was quite a while ago i did my testing (and of course i didn’t actually document anything…) but i was working with about 3 or 5 images seeing how much they shared so i could estimate from there, and it wasn’t good. very little duplication detected in spite of the obvious duplication that was taking place.
someone who knows what they’re doing might have much different results

Dedupe storage - how to best bypass pigz packaging?

126

12.3k

17.4k

155.8k