FOG storage node and data replication

  • Moderator

    Understand I’m not knocking the way it is today. (Actually from another recent thread it gave me an idea how to change this multi site POC by doing full installs at each site then pointing them back to the master node for the database information). The 1.x version of FOG that Tom revived has been great. For my deployment I see two issues/concerns.

    1. The current fog replicator only copies the files under the image name directory. This is perfect for image replication. But in my case I have something slightly different in that I have a driver structure in the images directory that I need to replicate to all storage nodes. I need all storage nodes to have all copies of all files from the master node.

    2. On the surface I see some strange behavior with the fog replicator consuming 100% cpu. This may be related to having that faux devices image in the system or something else going on. I also noted in the fog replicator log that when I restarted the fog replicator service (when debugging the 100% cpu issue) that it started copying files that already existed on the storage node. i.e.

    [10-23-15 1:24:43 pm]  * ncystorage - SubProcess -> Mirroring directory `WIN7PROSP1X86B03'
    [10-23-15 1:24:43 pm]  * nycstorage - SubProcess -> Removing old file `WIN7PROSP1X86B03/d1p2.img'
    [10-23-15 1:24:43 pm]  * nycstorage - SubProcess -> Transferring file `WIN7PROSP1X86B03/d1p2.img'

    I will look into the rsync to see what commands it supports directly. It may be possible to stitch it into the current deployment by replacing the lftp calls (just a guess) with rsync calls. But you did give me a few ideals to check into and places to look for settings, thank you.

  • @george1421 I’ve only had the opportunity to mess with the fog image replicator a handful of times in remote sessions with others. I can’t say I have seen this behavior but I’m not denying it either.

    If you are able to configure rsync to properly follow all the settings set in the DB for replication and are able to document how to set it up, I wouldn’t be surprised if it was adopted.

    There are a lot of settings, by the way… @Tom-Elliott explains it best… but from what I gather:

    • The master node replicates to all nodes in it’s storage group.

    • An image can belong to several storage groups. When this is the case, the master that has the image replicates to the other master. From there, the above step applies.

    • rsync must use the settings defined in /opt/fog/.fogsettings for the ftp credentials

    • rsync must use the replication bandwidth limitations set in the database

    • rsync must not re-compress images nor change files in any way.

  • Moderator

    Yes it can. The image files appear to be packed pretty good as they are.

    I need to check a bit more into the fog replicator. I noticed that if I restarted the fog replicator it goes through and starts replicating all of the files again, with 100% cpu on the master node. Almost like the replicator service was in a tight do loop until the file finished copying over. This is not very kind behavior, which is making me think that I should just go the rsync route and just disable the fog replicator service all together. Understand at this point I don’t have documented evidence that the fog replicator is at fault, only what I saw just before calling it a week.

  • @george1421 can rsync be used without additional compression?

  • Moderator

    @Joseph-Hales said:

    As the images are already compressed I’m not sure if rsync with compression would be of any benefit or might actually make copy times worse.

    Testing shows that data compression “does work” but the actual amount of compression does not add any value in a WAN transfer. I took a 4.8GB captured image and gzip’d it. The resultant image was only 50MB smaller than the source image (not much to really matter). The but the other benefits of rsync would still be worth looking into for my project.

  • Testers

    As the images are already compressed I’m not sure if rsync with compression would be of any benefit or might actually make copy times worse.

  • Moderator

    Following up on this in the AM, I now see all of the ‘files’ that were in /images/drivers on the master node now on the storage node. The FOG replicator only appears to copy just the files under the faux drivers image. What is missing now are the sub folders and files under /images/drivers (the actual driver files we push to the target just after imaging). So the idea to create a fake image kind of worked but not the way I needed it. As long as your files are located one level below the fake image folder then this is a workable solution.

    Being very self centered I would like to see FOG support something like rsync to sync/manage the images folder, especially if the storage node is located across a slow WAN connection because these image files tend to be very large and we could benefit from the transmittal data compression intelligent replication such a tool could provide.

  • Moderator

    My preference would be to not do something out of band if possible. It does appear that creating a fake image with its path set to /image/drivers is choking the FOG replicator because of the sub folders, so I’m going to back out that change. Because no replication is happening because of that error.

    I haven’t dug into the fog replicator code yet, but I’m wondering if rsync wouldn’t be a better method to replicate the images from the master node to the other storage nodes. Rsync would give us a few more advanced options like data compression and only syncing files that were changed than just a normal file copy.

  • @george1421 said:

    I suspect that the replicator was only designed to copy the image folder and one level of files below.

    That’s likely the case.

    You could just put them in the web directory and grab them via wget on the hosts…

  • Moderator

    Its a trunk build 5040.

    Looking at the drivers folder. I have a combination of files and sub folders. Depending on how smart the replicator is it may not handle or traverse the sub folders.

    The structure of the drivers folder is such.
    /images/drivers/OptiPlex7010/audio/<many files and sub folders>
    /images/drivers/OptiPlex7010/video/<many files and sub folders>

    I suspect that the replicator was only designed to copy the image folder and one level of files below.

  • Senior Developer

    You’re running FOG 1.2.0, or trunk?

  • Moderator

    Rebooting the storage node appears to have started the replication /images/drivers but so far only the first file has replicated.

    Looking at /opt/fog/logs/fogreplicator.log on the master node I see this error.

    [10-22-15 8:19:52 pm] * shvstorage - SubProcess -> mirror: Fatal error: 500 OOPS: priv_sock_get_cmd
    [10-22-15 8:21:08 pm] * shvstorage - SubProcess -> Mirroring directory drivers' [10-22-15 8:21:08 pm] * shvstorage - SubProcess -> Making directory drivers’
    [10-22-15 8:21:08 pm] * shvstorage - SubProcess -> Transferring file `drivers/’

    the zip file is the only thing in /images/drivers on the storage node.

  • Moderator

    Correction that error was on line 12 not line 1. I missed the last character when I copied the error.

  • Moderator

    Well that was interesting. Trying to force the replication by stopping and restarting the FOGImageReplicator service on the master node restarted without issue. Stopping and starting the same service on the storage node caused an error to be thrown

    PHP Fatal error: Call to undefined method ImageReplicator::getBanner() in /opt/fog/service/FOGImageReplicator/FOGImageReplicator on line 1

    Both the master node and storage nodes are running build 5040.

  • @george1421 I think that setting is hard coded… if there’s nothing to replicate, it will just check and then disconnect. It’s not resource intense.

    Also, you could just turn off the FOGImageReplicator… Enable it on boot for safety, but if you don’t want it to check (which is odd), you could just turn it off after boot.

  • Moderator

    OK that sounds like a plan. I’ll set that up right away.

    Do you know what the replication cycle interval is or where to find the setting? Under “normal” production once a days is sufficient, but I can see during development that we might need to shorten it to just a few hours.

  • I think the replication looks in the database to see what needs copied… Not positive on this, but some work has been done in this area recently.

    You could create a fake image with it’s path set as drivers to check if this is the case.

    EDIT: Confirmed that the DB is consulted, and that creating a fake image with the path you want replicated would work.

Log in to reply