• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Dedupe storage - how to best bypass pigz packaging?

    Scheduled Pinned Locked Moved Unsolved
    FOG Problems
    5
    16
    3.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H
      HROAdmin26
      last edited by

      Server
      • FOG Version: 1.2.0
      • OS: RHEL 6.8
      Client
      • Service Version: NA
      • OS: Windows (Various)
      Description

      In working with a deduplication storage location for images, I’ve found that pigz prevents any realistic dedupe results. (Even the same image backed up twice is not recognized by the dedupe engine.)

      I’ve attempted to modify the uploadFormat function in init’s usr/share/fog/lib/funcs.sh file to change pigz behavior (-i and -0 hardcoded) without any useful results. I’m now considering how best to remove pigz from from uploadFormat. But this will potentially impact image pushes, since pigz is called for decompression.

      1. Has the FOG team looked at/considered dedupe storage locations in the past?
      2. Am I approaching this in a good/backwards way?
      3. Any other suggestions on how to collect a ‘clean’ IMG upload without any gzip packaging?

      Thanks!

      uploadFormat()
      {
              if [ ! -n "$1" ]; then
                      echo "Missing Cores";
                      return;
              elif [ ! -n "$2" ]; then
                      echo "Missing file in file out";
                      return;
              elif [ ! -n "$3" ]; then
                      echo "Missing file name to store";
                      return;
              fi
              if [ "$imgFormat" == "2" ]; then
                      # pigz -p $1 $PIGZ_COMP < $2 | split -a 3 -d -b 200m - ${3}. &
                      pigz -i -p 1 -0 < $2 | split -a 3 -d -b 200m - ${3}. &
              else
                      if [ "$imgType" == "n" ]; then
                              # pigz -p $1 $PIGZ_COMP < $2 > ${3}.000 &
                              pigz -i -p 1 -0 < $2 > ${3}.000 &
                      else
                              # pigz -p $1 $PIGZ_COMP < $2 > $3 &
                              pigz -i -p 1 -0 < $2 > $3 &
                      fi
              fi
      }
      
      1 Reply Last reply Reply Quote 0
      • george1421G
        george1421 Moderator
        last edited by george1421

        AFAIK: The dev team isn’t even looking at dedup storage, even in 1.3.x. That is outside the scope of FOG imaging.

        What I can tell you is that the data compression that FOG uses (or any type of data compression) will mess with the dedup algorithm. If you want to use storage deduplication change the image compress factor to 0 and then let your storage device manage the image. Or increase your image compression value and don’t worry about dedup’ing the image.

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

        1 Reply Last reply Reply Quote 0
        • H
          HROAdmin26
          last edited by

          Hi George,

          I’ve tested extensively with ‘compression 0’ for pigz but cannot get any deduplication since (I assume) the data is still being chunked at 128K and placed in a gzip wrapper. Once the .img is written, the file command shows the data as ‘gzip compressed data.’ As shown in the uploadFormat code, I have hard-set pigz to -0 compression (as well as set the same within the FOG client properties.)

          Also, this dedupe storage is used for other data storage successfully (VM datastores, general fileserver data, etc.) (I’m an ‘old-hand’ with dedupe storage systems.)

          Thanks for the info about dev review/focus on dedupe filesystems. So with that in mind, I am only looking for suggestions/guidance on how best to bypass pigz (or if that is a large task affecting huge parts of FOG.)

          1 Reply Last reply Reply Quote 0
          • Tom ElliottT
            Tom Elliott
            last edited by

            Uploaded images are always passed through PIGZ regardless of the number for the compression. 0 does not be “no” compression, it just mins minimal.

            We don’t, currently, have a means for allowing uncompressed images and for most this is more than suitable.

            As far as it being captured zipped or not, I still think de-duplication engines would still have a hard time ensuring data is not copied twice. While it is true that Partclone is a block imaging utility, the way partclone stores the image is totally separate.

            That said,

            You could give a shot at editing the capture utility to not enforce compression at all. In particular, you would edit the /usr/share/fog/lib/funcs.sh file.

            In particular, as you’d be removing the zipped nature of the files you would need to either add a “image type” in the form of partclone, partimage, and uncompressed partclone.

            To edit You would (under current status of the files) edit lines 678 and 681 changing removing the pigz -d -c </tmp/pigz1 to contain cat /tmp/pigz1.

            For upload you’d edit lines:
            1544 to read as:
            cat $fifo | split -a 3 -d -b 200m - ${file}. &
            1547 to read as:
            cat $fifo > ${file}.000 &

            The nice part, with postinitscripts now, you can edit this file and have it copy the modified file into place before the main tasking begins.

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

            george1421G 1 Reply Last reply Reply Quote 0
            • george1421G
              george1421 Moderator @Tom Elliott
              last edited by

              @Tom-Elliott Tom, as a test, would it be possible to just use clonezilla to capture a disk image and store it on the target storage array? What I’m interested in is it even possible to dedup a huge binary file? Since both clonezilla and FOG use partclone to clone the image would both programs provide a similar file in structure? If it would, it would give the OP a way to test without hacking too much with the fog init scripts.

              Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

              Tom ElliottT 1 Reply Last reply Reply Quote 0
              • Tom ElliottT
                Tom Elliott @george1421
                last edited by

                @george1421 I’m working a prototype init that will enable use without compression.

                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                1 Reply Last reply Reply Quote 0
                • Tom ElliottT
                  Tom Elliott
                  last edited by

                  Prototype is up and appears to be working properly. With this new change it should theoretically be possible to use clonezilla images within fog, once files are named into fog formats. Same goes in reverse, fog images with uncompressed format should be able move into clonezilla provided the images are renamed to clonezilla naming standards.

                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                  Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                  Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                  george1421G 1 Reply Last reply Reply Quote 2
                  • george1421G
                    george1421 Moderator @Tom Elliott
                    last edited by

                    @Tom-Elliott Just so I’m clear, this feature is only available with the latest release of FOG (1.3.5.rc5)?

                    Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                    Tom ElliottT 1 Reply Last reply Reply Quote 0
                    • H
                      HROAdmin26
                      last edited by HROAdmin26

                      @Tom-Elliott Thanks Tom! I will test out the funcs.sh change to see how the upload results change.

                      @george1421 Yes, a clonezilla image directly uploaded will dedupe (with no compression.) The file is chunked into pieces and deduped based on those chunks. The art is to match the dedupe chunk size with the data inside the image and to match the chunk boundaries between the algorithm and the incoming data.

                      1 Reply Last reply Reply Quote 0
                      • Tom ElliottT
                        Tom Elliott @george1421
                        last edited by

                        @george1421 Correct. Really the working branch, it will be available for re-installs of rc5 but not in the GUI.

                        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                        1 Reply Last reply Reply Quote 0
                        • Tom ElliottT
                          Tom Elliott
                          last edited by

                          1.3.5 RC 6 has been released and should have this ‘uncompressed’ capability coded more properly for it.

                          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                          Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                          Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                          Jaymes DriverJ 1 Reply Last reply Reply Quote 1
                          • Jaymes DriverJ
                            Jaymes Driver Developer @Tom Elliott
                            last edited by

                            @Tom-Elliott Excitedly downloads new RC

                            WARNING TO USERS: My comments are written completely devoid of emotion, do not mistake my concise to the point manner as a personal insult or attack.

                            1 Reply Last reply Reply Quote 1
                            • JunkhackerJ
                              Junkhacker Developer
                              last edited by

                              sorry i didn’t see this thread earlier, but i have experimented with dedup of uncompressed fog images on a zfs filesystem to see if it was worth it. i saw far less gains than when using compression. but i really can’t say i’m highly experienced in dedup, so maybe i did something wrong. let us know how your experiments go

                              signature:
                              Junkhacker
                              We are here to help you. If you are unresponsive to our questions, don't expect us to be responsive to yours.

                              george1421G 1 Reply Last reply Reply Quote 1
                              • george1421G
                                george1421 Moderator @Junkhacker
                                last edited by

                                @Junkhacker I to am interested in seeing how the dedup rate compares to the same file compressed at a level 6. If for just information only. It would be interesting to know.

                                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                1 Reply Last reply Reply Quote 2
                                • Tom ElliottT
                                  Tom Elliott
                                  last edited by

                                  I think de-duplication from multiple images (like 10 images) would be much more suitable than the same 10 images with compression. But again, this means it will be seen when you are dealing with multiple images. For one or two images it’s probably not going to be worth the gain.

                                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                                  Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                  Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                  JunkhackerJ 1 Reply Last reply Reply Quote 0
                                  • JunkhackerJ
                                    Junkhacker Developer @Tom Elliott
                                    last edited by

                                    @Tom-Elliott it was quite a while ago i did my testing (and of course i didn’t actually document anything…) but i was working with about 3 or 5 images seeing how much they shared so i could estimate from there, and it wasn’t good. very little duplication detected in spite of the obvious duplication that was taking place.
                                    someone who knows what they’re doing might have much different results

                                    signature:
                                    Junkhacker
                                    We are here to help you. If you are unresponsive to our questions, don't expect us to be responsive to yours.

                                    1 Reply Last reply Reply Quote 0
                                    • 1 / 1
                                    • First post
                                      Last post

                                    250

                                    Online

                                    12.0k

                                    Users

                                    17.3k

                                    Topics

                                    155.2k

                                    Posts
                                    Copyright © 2012-2024 FOG Project