• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    File size/hash mismatch - Only on one storage node replicating nonstop

    Scheduled Pinned Locked Moved Solved
    FOG Problems
    2
    4
    269
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      Demache
      last edited by Demache

      Our deployment of Fog has gone fairly smoothly aside from this weird issue I’m having. Our 1st and 2nd remote nodes have no issues and only replicate when changes are made. On the 3rd node, the fog replication service always reports every single file in all images as a file size mismatch and deletes and copies a new file. And will do that infinitely (which is a bit problematic over a WAN). The weird thing is, the images are copying correctly. After the FTP job finishes, I checked the md5sum of every file and they all match the master. If I disable the replication, the images work correctly so I know there is nothing wrong with them.

      In the fogreplicator.log on the master I keep seeing something like this on all the files at that node:

      [01-23-20 10:33:19 am]   # Win10_1903_64bit_Nov2019: File size mismatch - d1.fixed_size_partitions: 9 !=
      [01-23-20 10:33:19 am]   # Win10_1903_64bit_Nov2019: Deleting remote file d1.fixed_size_partitions
      [01-23-20 10:33:20 am]   # Win10_1903_64bit_Nov2019: File size mismatch - d1.mbr: 1048576 !=
      [01-23-20 10:33:20 am]   # Win10_1903_64bit_Nov2019: Deleting remote file d1.mbr
      [01-23-20 10:33:20 am]   # Win10_1903_64bit_Nov2019: File size mismatch - d1.minimum.partitions: 793 !=
      [01-23-20 10:33:20 am]   # Win10_1903_64bit_Nov2019: Deleting remote file d1.minimum.partitions
      [01-23-20 10:33:20 am]   # Win10_1903_64bit_Nov2019: File size mismatch - d1.original.fstypes: 30 !=
      [01-23-20 10:33:20 am]   # Win10_1903_64bit_Nov2019: Deleting remote file d1.original.fstypes
      [01-23-20 10:33:21 am]   # Win10_1903_64bit_Nov2019: File hash mismatch - d1.original.swapuuids: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 !=
      [01-23-20 10:33:21 am]   # Win10_1903_64bit_Nov2019: Deleting remote file d1.original.swapuuids
      [01-23-20 10:33:21 am]   # Win10_1903_64bit_Nov2019: File size mismatch - d1.partitions: 793 !=
      [01-23-20 10:33:21 am]   # Win10_1903_64bit_Nov2019: Deleting remote file d1.partitions
      [01-23-20 10:33:22 am]   # Win10_1903_64bit_Nov2019: File size mismatch - d1p1.img: 421826977 !=
      [01-23-20 10:33:22 am]   # Win10_1903_64bit_Nov2019: Deleting remote file d1p1.img
      [01-23-20 10:33:22 am]   # Win10_1903_64bit_Nov2019: File size mismatch - d1p2.img: 13556397 !=
      [01-23-20 10:33:22 am]   # Win10_1903_64bit_Nov2019: Deleting remote file d1p2.img
      [01-23-20 10:33:22 am]   # Win10_1903_64bit_Nov2019: File size mismatch - d1p3.img: 254129 !=
      [01-23-20 10:33:22 am]   # Win10_1903_64bit_Nov2019: Deleting remote file d1p3.img
      [01-23-20 10:33:22 am]   # Win10_1903_64bit_Nov2019: File size mismatch - d1p4.img: 9857003499 !=
      [01-23-20 10:33:22 am]   # Win10_1903_64bit_Nov2019: Deleting remote file d1p4.img
      
      

      Going by the some of the other examples of failed hashes I have seen on here, there should be another value behind the !=. I assume that isn’t right and is why its failing.

      Does someone have an idea where I should look to correct this? Thanks.

      Fog version: 1.5.7
      OS on all hosts: CentOS 7

      1 Reply Last reply Reply Quote 0
      • S
        Sebastian Roth Moderator
        last edited by

        @Demache What’s the difference between 1st/2nd storage node and the 3rd one? I suppose the WAN tunnel between head quarter and 3rd location doesn’t allow the communication needed to query the information from the 3rd storage node. For both size and hash the replication service first uses HTTP or HTTPS (depending on how you installed) and if it doesn’t receive a proper answer it also tries to retrieve the size and hash (only for files smaller than 10 MB) information via FTP. So if all those protocols are blocked the replication cannot work.

        Now as I re-think what I wrote this doesn’t make sense because you said that it does replicate the files (using FTP!) properly. So we probably need to take a closer look.

        As I said, it checks size first which seems to fail in most cases considering the log you posted. BUT there is one file where it seems to be ok with size and then goes ahead to match the checksums and fails on that. Seems to be a bit random.

        Well, using the information given you might try to see if the WAN tunnel might oppose some restrictions that might cause this.

        As well you want to check apache access and error log on the 3rd storage node to see if those requests actually ever read that node.

        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

        D 1 Reply Last reply Reply Quote 1
        • D
          Demache @Sebastian Roth
          last edited by Demache

          @Sebastian-Roth Aha! You were right, there was an issue with it communicating with HTTP, but it turns out that 3rd node had the HTTP protocol set to HTTPS in .fogsettings for some reason and it was causing that fail. Must have been because I was following my own documentation blindly because the master node does have HTTPS enforced and I wrote that after the 1st and 2nd node were already set up. Whoops. I’m guessing that function doesn’t work if the remote storage enforces HTTPS?

          Anyway, I set that back to HTTP in .fogsettings and reran the install script. Turned the service back on again and now I get “no need to sync” as it should on the 3rd node. Perfect!

          Thanks for pointing me in the right direction.

          1 Reply Last reply Reply Quote 0
          • S
            Sebastian Roth Moderator
            last edited by

            @Demache Nice we found this and you were able to fix it so quickly. When looking through the code I thought about HTTP/HTTPS possibly being an issue but dropped that idea. Now looking at it again I think you have found a bug in the code! Just pushed a fix.

            Though I still really wonder why the backup logic of checking size via FTP didn’t work in your case either.

            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

            1 Reply Last reply Reply Quote 0
            • 1 / 1
            • First post
              Last post

            136

            Online

            12.1k

            Users

            17.3k

            Topics

            155.3k

            Posts
            Copyright © 2012-2024 FOG Project