FOG continuously syncs files that haven't changed to remote storage nodes



  • I have finally set up my perfect fog setup for our company. One main storage node and dozens of remote storage notes for fast imaging. I’ve used cron jobs to schedule the syncing and all that is working. Here’s my problem,

    During the scheduled sync time I can see on the remote nodes that fog is replicating images and snapins that haven’t changes since last syncing. Over and over again. Why is FOG syncing files that are already there and haven’t changed?



  • Went back to Fog Node’s instead of Windows and it’s working as expected now. You can resolve this.


  • Moderator

    @george1421 Confirmed these applications exist on my test synology nas (I’ve done bad things to this NAS so it might have the kitchen sink too).

    du is installed and available.
    awk is installed and available.
    sha512sum is installed and available.
    head is installed and available.
    tail is installed and available.

    They also exist on my production synology nas.


  • Moderator

    @tom-elliott Now that’s service!!

    I can say for the synology nas, you have to install php as a package. I’ll confirm that the other utilities exist.


  • Senior Developer

    The code in php is:

         /**
          * Returns hash of passed file.
          *
          * @param string $file The file to get hash of.
          *
          * @return string
          */
         public static function getHash($file)
         {
             usleep(50000);
             $file = escapeshellarg($file);
             $filesize = self::getFilesize($file);
             if ($filesize <= 10485760) {
                 return trim(
                     shell_exec("sha512sum $file | awk '{print $1}'")
                 );
             }
             return trim(
                 shell_exec(
                     sprintf(
                         "(%s -c %d %s; %s -c %d %s) | sha512sum | awk '{print $1}'",
                         'head',
                         10486760,
                         $file,
                         'tail',
                         10486760,
                         $file
                     )
                 )
             );
         }
    

    The function is not necessary to be in a class file. The url that gets called is: http://{nodeip}/fog/status/gethash.php

    The passed data (the $file parameter) comes from the sent link: ($_POST['file']).

    If you can create a file on the storage node in /path/to/web/files

    mkdir -p fog/status

    Create a php file named gethash.php

    Add the lines:

    <?php
    // Delay for 50 milliseconds.
    usleep(50000);
    // File is passed base64 encoded, decode it here.
    $debase_file = base64_decode($_POST['file']);
    // We can't trust user inputs, so escape the file we will check.
    $file = escapeshellarg($debase_file);
    // The file doesn't exist, return blank.
    if (!file_exists($file)) {
        return '';
    }
    // Get the filesize.
    $filesize = trim(
        shell_exec("du -b $file | awk '{print $1}'")
    );
    // If the file is less than one meg, return the hash of the full file.
    if ($filesize <= 10485760) {
        return trim(
            shell_exec("sha512sum $file | awk '{print $1}'")
        );
    }
    // Otherwise return the first and last meg hashed -- hashing full files takes a long time
    return trim(
        shell_exec(
            sprintf(
                "(%s -c %d %s; %s -c %d %s) | sha512sum | awk '{print $1}'",
                'head',
                10486760,
                $file,
                'tail',
                10486760,
                $file
            )
        )
    );
    

    This of course assumes a few things.

    FIrst:
    PHP is installed on the NAS (probably is)
    du is installed and available.
    awk is installed and available.
    sha512sum is installed and available.
    head is installed and available.
    tail is installed and available.

    Hopefully this helps.


  • Moderator

    @tom-elliott I think that the new method is cleaner than before. Is that call in the fog replicator code? The thought would be to reverse engineer it. Or see if we could get some node.js code running on the NAS to respond to the replicator with the answer its looking for. I know quite a few people use a NAS as a storage node.


  • Senior Developer

    @george1421 it was changed because bashing files was using a lot of bandwidth when having The main do the work. Now we make a request to the node, the node then passes back the hash of the file it’s looking for, leaving only minimal bandwidth usage.



  • @george1421 Setting up a webserver on 40+ sites is not really a viable solution. I may have to pitch some cheap boxes for nodes only and send them to the sites. Thanks anyway guys. This may be worth noting on this link

    https://wiki.fogproject.org/wiki/index.php?title=Windows_Storage_Node

    that replication will continuously occur over and over again.


  • Moderator

    @kafluke Assuming its possible, you will need to install php on your windows nas. I started working on a PoC for using a windows 2012 server as a nas. You can either install WAMP (well just WAP) on windows or install a php plugin for IIS.



  • @tom-elliott Is there any way to get these webfiles transferred over to a windows NAS? I was able to move the hidden linux files that are found in the image directory. When FOG syncs where does it look for these webfiles? I can mount them in a NFS directory if need be.


  • Moderator

    @tom-elliott said in FOG continuously syncs files that haven't changed to remote storage nodes:

    . if the nodes don’t have the webfiles this would explain the continuous syncing. Fog can’t validate the remote files are the same so it will continuously delete and resync all the files.

    Ugh, this is a game changer. When did this “feature” change? (not intending to make this sound harsh) This requirement will now break using a nas as a storage node.


  • Senior Developer

    @kafluke so this is where things get interesting. Storage nodes use a common web file that tells them which files have which sizes and hashes. if the nodes don’t have the webfiles this would explain the continuous syncing. Fog can’t validate the remote files are the same so it will continuously delete and resync all the files.



  • I’m using windows based machines for the storage nodes using NFS server. FOG is only running on the master node and the HDD is fine.


  • Senior Developer

    @kafluke this can happen for many reasons.

    Check hdds on the main node that pushes out the data. If the hdds seem fine check that there’s enough time for replication to occur between the server and the last node it sends data too.

    Check the version of fog for all servers/nodes. Make sure they’re all on the same version.there was a similar problem in the past with replication that has been fixed, I don’t remember the version this fell under but it was probably in the 1.3 era.


 

339
Online

41.2k
Users

11.6k
Topics

110.7k
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.