• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Replication deletes file before transfer is complete

    Scheduled Pinned Locked Moved Solved
    Bug Reports
    2
    10
    2.1k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Wayne WorkmanW
      Wayne Workman
      last edited by Wayne Workman

      r6575.

      What I’m noticing is that due to the slowness of my fog servers transfer rates, it can’t get done before the replicator checks again and determines the files aren’t the same, then the replicator deletes the remote file and starts over. It appears the time limit is 20 minutes.

      This particular file that is being re-replicated over and over happens to be the biggest one in the storage group.

      Here are some of the logs from the replicator:

      [03-05-16 9:58:55 pm] * Started sync for Image Win7
      [03-05-16 9:58:55 pm] | PID: 30391
      [03-05-16 9:58:55 pm] | Replication not complete
      [03-05-16 9:58:55 pm] * Deleting remote file: /images/Win7/d1p2.img
      [03-05-16 9:58:55 pm] | Files do not match
      [03-05-16 9:58:55 pm] | Remote File size: 7051149312
      [03-05-16 9:58:55 pm] | Local File size: 9291944405
      [03-05-16 9:58:55 pm] | Remote File: /images/Win7/d1p2.img
      [03-05-16 9:58:55 pm] | Local File: /images/Win7/d1p2.img
      
      [03-05-16 9:38:13 pm] * Started sync for Image Win7
      [03-05-16 9:38:13 pm] | PID: 15700
      [03-05-16 9:38:13 pm] | Replication not complete
      [03-05-16 9:38:13 pm] * Deleting remote file: /images/Win7/d1p2.img
      [03-05-16 9:38:13 pm] | Files do not match
      [03-05-16 9:38:13 pm] | Remote File size: 7149060096
      [03-05-16 9:38:13 pm] | Local File size: 9291944405
      [03-05-16 9:38:13 pm] | Remote File: /images/Win7/d1p2.img
      [03-05-16 9:38:13 pm] | Local File: /images/Win7/d1p2.img
      
      [03-05-16 9:07:15 pm] * Started sync for Image Win7
      [03-05-16 9:07:15 pm] | PID: 11812
      [03-05-16 9:07:15 pm] | Replication not complete
      [03-05-16 9:07:15 pm] * Deleting remote file: /images/Win7/d1p2.img
      [03-05-16 9:07:15 pm] | Files do not match
      [03-05-16 9:07:15 pm] | Remote File size: 4564320256
      [03-05-16 9:07:15 pm] | Local File size: 9291944405
      [03-05-16 9:07:15 pm] | Remote File: /images/Win7/d1p2.img
      [03-05-16 9:07:15 pm] | Local File: /images/Win7/d1p2.img
      
      [03-05-16 8:47:02 pm] * Started sync for Image Win7
      [03-05-16 8:47:02 pm] | PID: 11210
      [03-05-16 8:47:02 pm] | Replication not complete
      [03-05-16 8:47:02 pm] * Deleting remote file: /images/Win7/d1p2.img
      [03-05-16 8:47:02 pm] | Files do not match
      [03-05-16 8:47:02 pm] | Remote File size: 7483969536
      [03-05-16 8:47:02 pm] | Local File size: 9291944405
      [03-05-16 8:47:01 pm] | Remote File: /images/Win7/d1p2.img
      [03-05-16 8:47:01 pm] | Local File: /images/Win7/d1p2.img
      

      Here’s what the bandwidth looks like during the time of these logs:
      0_1457236867130_Replicator bandwidth.png

      Given, my servers are very old and slow, and given the bandwidth is capped at 100Mbps (I’ll look into that but it’s not a fog problem), this still is a potential issue for slow links, or for really large images. If the checksum is not getting done within the allowed time due to the image being 100 or 150GB in size (There is a site in my organization with an image that big), or if the image isn’t replicated in time due to a slow link, this same problem will crop up again.

      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
      Daily Clean Installation Results:
      https://fogtesting.fogproject.us/
      FOG Reporting:
      https://fog-external-reporting-results.fogproject.us/

      1 Reply Last reply Reply Quote 0
      • Tom ElliottT
        Tom Elliott
        last edited by

        Should be fixed, thanks for reporting.

        I had the delete happening before checking if the prior replication was still running. Added a check to break out if it is already running.

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

        1 Reply Last reply Reply Quote 2
        • Wayne WorkmanW
          Wayne Workman
          last edited by

          I’m still seeing the same issue. Last night I updated both the master and slave and then went to bed. This morning I find that replication didn’t get done still. So I re-updated both again and I’m seeing this entry in the logs every 10 minutes:

          [03-06-16 8:41:19 am] * Started sync for Image Win7
          [03-06-16 8:41:19 am] | PID: 4079
          [03-06-16 8:41:19 am] | Replication not complete
          [03-06-16 8:41:18 am] * Deleting remote file: /images/Win7/d1p2.img
          [03-06-16 8:41:18 am] | Files do not match
          [03-06-16 8:41:18 am] | Remote File size: 6607823640
          [03-06-16 8:41:18 am] | Local File size: 9291944405
          [03-06-16 8:41:17 am] | Remote File: /images/Win7/d1p2.img
          [03-06-16 8:41:17 am] | Local File: /images/Win7/d1p2.img
          

          Another thing I noticed is that the bandwidth maxes out when the d1p2.img file is checksummed. At this point in the logs below, it’s not even deleted yet but this is the moment where the bandwidth maxes out:

          [03-06-16 8:51:25 am] * Started sync for Image Win7
          [03-06-16 8:51:25 am] | Files match
          [03-06-16 8:51:25 am] | Remote File size: 8899842
          [03-06-16 8:51:25 am] | Local File size: 8899842
          [03-06-16 8:51:25 am] | Remote File: /images/Win7/d1p1.img
          [03-06-16 8:51:25 am] | Local File: /images/Win7/d1p1.img
          [03-06-16 8:51:25 am] | Files match
          [03-06-16 8:51:25 am] | Remote File size: 190
          [03-06-16 8:51:25 am] | Local File size: 190
          [03-06-16 8:51:25 am] | Remote File: /images/Win7/d1.partitions
          [03-06-16 8:51:25 am] | Local File: /images/Win7/d1.partitions
          [03-06-16 8:51:25 am] | Files match
          [03-06-16 8:51:25 am] | Remote File size: 0
          [03-06-16 8:51:25 am] | Local File size: 0
          [03-06-16 8:51:25 am] | Remote File: /images/Win7/d1.original.swapuuids
          [03-06-16 8:51:25 am] | Local File: /images/Win7/d1.original.swapuuids
          [03-06-16 8:51:25 am] | Files match
          [03-06-16 8:51:25 am] | Remote File size: 30
          [03-06-16 8:51:25 am] | Local File size: 30
          [03-06-16 8:51:25 am] | Remote File: /images/Win7/d1.original.fstypes
          [03-06-16 8:51:25 am] | Local File: /images/Win7/d1.original.fstypes
          [03-06-16 8:51:25 am] | Files match
          [03-06-16 8:51:25 am] | Remote File size: 190
          [03-06-16 8:51:25 am] | Local File size: 190
          [03-06-16 8:51:25 am] | Remote File: /images/Win7/d1.minimum.partitions
          [03-06-16 8:51:25 am] | Local File: /images/Win7/d1.minimum.partitions
          [03-06-16 8:51:25 am] | Files match
          [03-06-16 8:51:25 am] | Remote File size: 1048576
          [03-06-16 8:51:25 am] | Local File size: 1048576
          [03-06-16 8:51:25 am] | Remote File: /images/Win7/d1.mbr
          [03-06-16 8:51:25 am] | Local File: /images/Win7/d1.mbr
          [03-06-16 8:51:25 am] | Files match
          [03-06-16 8:51:25 am] | Remote File size: 1
          [03-06-16 8:51:25 am] | Local File size: 1
          [03-06-16 8:51:25 am] | Remote File: /images/Win7/d1.fixed_size_partitions
          [03-06-16 8:51:25 am] | Local File: /images/Win7/d1.fixed_size_partitions
          [03-06-16 8:51:24 am] | Image name: Win7
          

          0_1457276104529_Replicator Bandwidth 2.png

          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
          Daily Clean Installation Results:
          https://fogtesting.fogproject.us/
          FOG Reporting:
          https://fog-external-reporting-results.fogproject.us/

          1 Reply Last reply Reply Quote 0
          • Wayne WorkmanW
            Wayne Workman
            last edited by

            [03-06-16 9:02:01 am] * Started sync for Image Win7
            [03-06-16 9:02:01 am] | PID: 17781
            [03-06-16 9:02:01 am] | Replication not complete
            [03-06-16 9:02:01 am] * Deleting remote file: /images/Win7/d1p2.img
            [03-06-16 9:02:01 am] | Files do not match
            [03-06-16 9:02:01 am] | Remote File size: 7127236608
            [03-06-16 9:02:01 am] | Local File size: 9291944405
            [03-06-16 9:02:01 am] | Remote File: /images/Win7/d1p2.img
            [03-06-16 9:02:01 am] | Local File: /images/Win7/d1p2.img
            

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
            Daily Clean Installation Results:
            https://fogtesting.fogproject.us/
            FOG Reporting:
            https://fog-external-reporting-results.fogproject.us/

            1 Reply Last reply Reply Quote 0
            • Wayne WorkmanW
              Wayne Workman
              last edited by Wayne Workman

              Ok more information on this.

              I was keeping an eye on the file itself, and right before the replicator re-runs, the file is actually still growing:

              [root@application Win7]# ls -la d1p2.img
              -rw-r--r--. 1 fog fog 6725181416 Mar  6 09:22 d1p2.img
              [root@application Win7]# ls -la d1p2.img
              -rw-r--r--. 1 fog fog 6803567240 Mar  6 09:22 d1p2.img
              [root@application Win7]# ls -la d1p2.img
              -rw-r--r--. 1 fog fog 6952289792 Mar  6 09:22 d1p2.img
              [root@application Win7]# ls -la d1p2.img
              -rw-r--r--. 1 fog fog 7065146216 Mar  6 09:22 d1p2.img
              [root@application Win7]# ls -la d1p2.img
              ls: cannot access d1p2.img: No such file or directory
              [root@application Win7]# ls -la d1p2.img
              ls: cannot access d1p2.img: No such file or directory
              [root@application Win7]# ls -la d1p2.img
              ls: cannot access d1p2.img: No such file or directory
              [root@application Win7]# ls -la d1p2.img
              ls: cannot access d1p2.img: No such file or directory
              [root@application Win7]# ls -la
              total 9732
              drwxr-xr-x.  2 fog  fog      174 Mar  6 09:23 .
              drwxrwxrwx. 10 root root     228 Mar  5 16:36 ..
              -rw-r--r--.  1 fog  fog        1 Mar  5 14:40 d1.fixed_size_partitions
              -rw-r--r--.  1 fog  fog  1048576 Mar  5 14:40 d1.mbr
              -rw-r--r--.  1 fog  fog      190 Mar  5 14:40 d1.minimum.partitions
              -rw-r--r--.  1 fog  fog       30 Mar  5 14:40 d1.original.fstypes
              -rw-r--r--.  1 fog  fog        0 Mar  5 14:40 d1.original.swapuuids
              -rw-r--r--.  1 fog  fog  8899842 Mar  5 14:40 d1p1.img
              -rw-r--r--.  1 fog  fog      190 Mar  5 14:40 d1.partitions
              

              Right where you see the file become unavailable, that’s when the replicator deleted it:

              [03-06-16 9:23:01 am] * Started sync for Image Win7
              [03-06-16 9:23:01 am] | PID: 31237
              [03-06-16 9:23:01 am] | Replication not complete
              [03-06-16 9:23:01 am] * Deleting remote file: /images/Win7/d1p2.img
              [03-06-16 9:23:01 am] | Files do not match
              [03-06-16 9:23:01 am] | Remote File size: 7170686976
              [03-06-16 9:23:01 am] | Local File size: 9291944405
              [03-06-16 9:23:01 am] | Remote File: /images/Win7/d1p2.img
              [03-06-16 9:23:01 am] | Local File: /images/Win7/d1p2.img
              

              Here is the bandwidth at this exact moment, there is no change in bandwidth. Later on, the bandwidth drops off to nothing:

              0_1457278501816_Replicator Bandwidth 3.png

              Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
              Daily Clean Installation Results:
              https://fogtesting.fogproject.us/
              FOG Reporting:
              https://fog-external-reporting-results.fogproject.us/

              1 Reply Last reply Reply Quote 0
              • Tom ElliottT
                Tom Elliott
                last edited by

                found there was a bad call (cause I’m a dumba$$), in that the checker would not work the first time around. It would actually break on the wrong file which would have caused some issues. Fixed by moving the check to the start of the main calling loop and not even do anything further down if the original item is already running.

                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                1 Reply Last reply Reply Quote 1
                • Wayne WorkmanW
                  Wayne Workman
                  last edited by Wayne Workman

                  I updated both servers to r6597 at about 10:55, I just got back home and found this on the bandwidth graph:

                  0_1457291080274_Replicator Bandwidth 4.png

                  I think the graph stopped working after an hour because my laptop went to sleep. But logging back into the fog web interface, the bandwidth is still riding at 100Mbps, which is the max on this particular link.

                  This is what just now showed up in the logs, which is the same thing as before:

                  [03-06-16 1:06:48 pm] * Started sync for Image Win7
                  [03-06-16 1:06:46 pm] * Deleting remote file: /images/Win7/d1p2.img
                  [03-06-16 1:06:46 pm] | Files do not match
                  [03-06-16 1:06:46 pm] | Remote File size: 3003860200
                  [03-06-16 1:06:46 pm] | Local File size: 9291944405
                  [03-06-16 1:06:45 pm] | Remote File: /images/Win7/d1p2.img
                  [03-06-16 1:06:45 pm] | Local File: /images/Win7/d1p2.img
                  

                  I’m going to reboot both servers and see if that does anything.

                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
                  Daily Clean Installation Results:
                  https://fogtesting.fogproject.us/
                  FOG Reporting:
                  https://fog-external-reporting-results.fogproject.us/

                  1 Reply Last reply Reply Quote 0
                  • Wayne WorkmanW
                    Wayne Workman
                    last edited by

                    After rebooting both master and slave nodes, I watched the logs cycle through the replicator checks 3 times, watched it say it’s deleting the d1p2 file 3 times in a row.

                    [03-06-16 1:20:08 pm] * Started sync for Image Win7
                    [03-06-16 1:20:03 pm] * Deleting remote file: /images/Win7/d1p2.img
                    [03-06-16 1:20:03 pm] | Files do not match
                    [03-06-16 1:20:03 pm] | Remote File size: 2715025408
                    [03-06-16 1:20:03 pm] | Local File size: 9291944405
                    [03-06-16 1:20:03 pm] | Remote File: /images/Win7/d1p2.img
                    [03-06-16 1:20:03 pm] | Local File: /images/Win7/d1p2.img
                    

                    Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
                    Daily Clean Installation Results:
                    https://fogtesting.fogproject.us/
                    FOG Reporting:
                    https://fog-external-reporting-results.fogproject.us/

                    1 Reply Last reply Reply Quote 0
                    • Tom ElliottT
                      Tom Elliott
                      last edited by

                      Still me being an idiot.

                      I had an iterator being reused within an already existing iterator.

                      This should now be fixed, (based on my testing).

                      I’m solving again.

                      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                      Wayne WorkmanW 1 Reply Last reply Reply Quote 0
                      • Wayne WorkmanW
                        Wayne Workman @Tom Elliott
                        last edited by

                        @Tom-Elliott Confirmed fixed.

                        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
                        Daily Clean Installation Results:
                        https://fogtesting.fogproject.us/
                        FOG Reporting:
                        https://fog-external-reporting-results.fogproject.us/

                        1 Reply Last reply Reply Quote 1
                        • 1 / 1
                        • First post
                          Last post

                        154

                        Online

                        12.0k

                        Users

                        17.3k

                        Topics

                        155.2k

                        Posts
                        Copyright © 2012-2024 FOG Project