• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    deduplication of images files possible yet?

    Scheduled Pinned Locked Moved
    General
    4
    5
    1.0k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mfinn999
      last edited by

      There are a couple of posts in the past couple years about deduplication, partclone 0.3.x, pigz vs gzip with the --rsyncable parameter to stop the rolling checksum.

      See:
      https://forums.fogproject.org/topic/13206/the-future-of-partclone-and-therefore-fog-as-it-is
      https://forums.fogproject.org/topic/12750/file-format-and-compression-option-request

      From these posts there seemed to be some success, but required versions of partclone, etc that were not yet in FOG.

      Is it possible to configure recent FOG versions so that the images can be successfully deduplicated yet? If so, what needs to be configured to make it possible? I am running FOG 1.5.8.

      1 Reply Last reply Reply Quote 0
      • S
        Sebastian Roth Moderator
        last edited by

        @mfinn999 As far as I know we have all that in FOG 1.5.8 already. Though I have to say that it’s mostly @Junkhacker who’s pushed this forward and knows all the details. But as far as I am concerned I would say we have all the tools and command line options in 1.5.8.

        Can you see it’s not working as intended? Please provide evidence we can work on.

        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

        1 Reply Last reply Reply Quote 0
        • M
          mfinn999
          last edited by

          After reading the posts I listed, and seeing what was in 1.5.8, I know that most of the programs should be capable. I am wondering how to “enable” it. We have 226 images using 5.4TB on an XFS partition on the FOG server:

          #df -h
          Filesystem Size Used Avail Use% df -h
          Mounted on
          /dev/mapper/cl_fog2-images 28T 5.4T 22T 20% /images

          I used rsync to make a copy of that to a CentOS 8 VDO volume with compression and deduplication enabled:

          #vdostats --hu
          Device Size Used Available Use% Space saving%
          /dev/mapper/vdo1 6.0T 5.0T 1.0T 83% 2%
          #df -h
          Filesystem Size Used Avail Use% Mounted on
          /dev/mapper/vdo1 6.0T 5.2T 833G 87% /backup

          Compression seems to have saved a small amount, but as most of the images are based on the the same “base” image, dedupe should have reduced it a great deal more. The images are compressed on the FOG server with default settings.

          1 Reply Last reply Reply Quote 0
          • george1421G
            george1421 Moderator
            last edited by

            @mfinn999 Deduplication really hasn’t been studied on the FOG captured images. On the two links you provided there was discussion about adding certain options to the utilities that capture the image. Those options were added to the FOG code base.

            here: https://github.com/FOGProject/fos/blob/master/Buildroot/board/FOG/FOS/rootfs_overlay/usr/share/fog/lib/funcs.sh#L2089

            and here: https://github.com/FOGProject/fos/blob/master/Buildroot/board/FOG/FOS/rootfs_overlay/usr/share/fog/lib/funcs.sh#L1594

            Beyond that no other testing have been done. Also realize that FOG does nothing in regards to dedup, that role should be done by the host OS or host hardware of the FOG server.

            Also understand that the options that were added to the image capture do not specifically address dedup operations. Those (new) settings will only impact newly captured images in zstd format and not gzip.

            How FOG captures images is that it uses a utility called partclone to read the disk blocks on the target computer. Then it directs those read blocks through a compressor (zstd, or gzip) before being sent to the FOG server. The FOG server takes the compressed blocks from the target computer and writes them unaltered to the FOG server’s disk. So what’s written to the fog server’s disk is a packed (compressed) binary file. I can’t see how two images would have a lot of duplicating blocks to make dedup even effective here.

            @Junkhacker @Quazz Do you know anything I’m missing here?

            TBH I wonder if the -B (block size) option for zstd would have an impact on the dedup’d image. But also it would require the FOG developers to have access to dedup storage (and the desire) to see if there were any improvements that could be made in this area.

            [donald@duckserver html]# zstdmt --help
            *** zstd command line interface 64-bits v1.4.4, by Yann Collet ***
            Usage :
                  zstdmt [args] [FILE(s)] [-o file]
            
            FILE    : a filename
                      with no FILE, or when FILE is - , read standard input
            Arguments :
             -#     : # compression level (1-19, default: 3)
             -d     : decompression
             -D file: use `file` as Dictionary
             -o file: result stored into `file` (only if 1 input file)
             -f     : overwrite output without prompting and (de)compress links
            --rm    : remove source file(s) after successful de/compression
             -k     : preserve source file(s) (default)
             -h/-H  : display help/long help and exit
            
            Advanced arguments :
             -V     : display Version number and exit
             -v     : verbose mode; specify multiple times to increase verbosity
             -q     : suppress warnings; specify twice to suppress errors too
             -c     : force write to standard output, even if it is the console
             -l     : print information about zstd compressed files
            --exclude-compressed:  only compress files that are not previously compressed
            --ultra : enable levels beyond 19, up to 22 (requires more memory)
            --long[=#]: enable long distance matching with given window log (default: 27)
            --fast[=#]: switch to very fast compression levels (default: 1)
            --adapt : dynamically adapt compression level to I/O conditions
            --stream-size=# : optimize compression parameters for streaming input of given number of bytes
            --size-hint=# optimize compression parameters for streaming input of approximately this size
            --target-compressed-block-size=# : make compressed block near targeted size
             -T#    : spawns # compression threads (default: 1, 0==# cores)
             -B#    : select size of each job (default: 0==automatic)
            --rsyncable : compress using a rsync-friendly method (-B sets block size)
            --no-dictID : don't write dictID into header (dictionary compression)
            --[no-]check : integrity check (default: enabled)
            --[no-]compress-literals : force (un)compressed literals
             -r     : operate recursively on directories
            --output-dir-flat[=directory]: all resulting files stored into `directory`.
            --format=zstd : compress files to the .zst format (default)
            --test  : test compressed file integrity
            --[no-]sparse : sparse mode (default: enabled on file, disabled on stdout)
             -M#    : Set a memory usage limit for decompression
            --no-progress : do not display the progress bar
            --      : All arguments after "--" are treated as files
            
            Dictionary builder :
            --train ## : create a dictionary from a training set of files
            --train-cover[=k=#,d=#,steps=#,split=#,shrink[=#]] : use the cover algorithm with optional args
            --train-fastcover[=k=#,d=#,f=#,steps=#,split=#,accel=#,shrink[=#]] : use the fast cover algorithm with optional args
            --train-legacy[=s=#] : use the legacy algorithm with selectivity (default: 9)
             -o file : `file` is dictionary name (default: dictionary)
            --maxdict=# : limit dictionary to specified size (default: 112640)
            --dictID=# : force dictionary ID to specified value (default: random)
            
            Benchmark arguments :
             -b#    : benchmark file(s), using # compression level (default: 3)
             -e#    : test all compression levels from -bX to # (default: 1)
             -i#    : minimum evaluation time in seconds (default: 3s)
             -B#    : cut file into independent blocks of size # (default: no block)
            --priority=rt : set process priority to real-time
            

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

            JunkhackerJ 1 Reply Last reply Reply Quote 1
            • JunkhackerJ
              Junkhacker Developer @george1421
              last edited by

              @george1421 i meant to reply to this a long time ago, but here goes.

              testing on deduping of those images has been done. they dedup quite well. the dedup changes affect zstd and pigz compressed images. pigz compressed images actually dedup better, but the the compression and performance are worse. it’s a tradeoff to be evaluated by the individual.

              dedup is only possible with the newer version of partclone due to a rolling checksum integraed into the image format on earlier versions. the newer version lets us choose no checksum.

              the compressed binary file is dedupable thanks to the --rsyncable flag on compression that is supported by both pigz and zstd.

              like george said, any deduping would be the responsibility of the underlying filesystem or storage, not built into fog itself.

              signature:
              Junkhacker
              We are here to help you. If you are unresponsive to our questions, don't expect us to be responsive to yours.

              1 Reply Last reply Reply Quote 0
              • 1 / 1
              • First post
                Last post

              157

              Online

              12.0k

              Users

              17.3k

              Topics

              155.2k

              Posts
              Copyright © 2012-2024 FOG Project