FOG storage node and data replication

george1421

I’m working on a multisite proof of concept configuration. I setup a fog storage node and everything worked great. All of the images were copied over to the storage node. I currently have the target machine drivers in /images/drivers folder and this folder was not copied to the storage node, which was kind of expected since this folder is not standard to FOG. My question is how can I add this directly to the list of directories that gets replicated between the storage nodes? Or am I going about it the wrong way. I need the target machine’s drivers replicated to all storage nodes too.

Wayne Workman

I think the replication looks in the database to see what needs copied… Not positive on this, but some work has been done in this area recently.

You could create a fake image with it’s path set as drivers to check if this is the case.

EDIT: Confirmed that the DB is consulted, and that creating a fake image with the path you want replicated would work.

george1421

OK that sounds like a plan. I’ll set that up right away.

Do you know what the replication cycle interval is or where to find the setting? Under “normal” production once a days is sufficient, but I can see during development that we might need to shorten it to just a few hours.

Wayne Workman

@george1421 I think that setting is hard coded… if there’s nothing to replicate, it will just check and then disconnect. It’s not resource intense.

Also, you could just turn off the FOGImageReplicator… Enable it on boot for safety, but if you don’t want it to check (which is odd), you could just turn it off after boot.

george1421

Well that was interesting. Trying to force the replication by stopping and restarting the FOGImageReplicator service on the master node restarted without issue. Stopping and starting the same service on the storage node caused an error to be thrown

PHP Fatal error: Call to undefined method ImageReplicator::getBanner() in /opt/fog/service/FOGImageReplicator/FOGImageReplicator on line 1

Both the master node and storage nodes are running build 5040.

george1421

Correction that error was on line 12 not line 1. I missed the last character when I copied the error.

george1421

Rebooting the storage node appears to have started the replication /images/drivers but so far only the first file has replicated.

Looking at /opt/fog/logs/fogreplicator.log on the master node I see this error.

[10-22-15 8:19:52 pm] * shvstorage - SubProcess -> mirror: Fatal error: 500 OOPS: priv_sock_get_cmd
[10-22-15 8:21:08 pm] * shvstorage - SubProcess -> Mirroring directory drivers' [10-22-15 8:21:08 pm] * shvstorage - SubProcess -> Making directory drivers’
[10-22-15 8:21:08 pm] * shvstorage - SubProcess -> Transferring file `drivers/DN2820FYK.zip’

the zip file is the only thing in /images/drivers on the storage node.

Tom Elliott

You’re running FOG 1.2.0, or trunk?

george1421

Its a trunk build 5040.

Looking at the drivers folder. I have a combination of files and sub folders. Depending on how smart the replicator is it may not handle or traverse the sub folders.

The structure of the drivers folder is such.
/images/drivers/OptiPlex7010.zip
/images/drivers/OptiPlex7010/audio
/images/drivers/OptiPlex7010/audio/<many files and sub folders>
/images/drivers/OptiPlex7010/video/<many files and sub folders>
<…>

I suspect that the replicator was only designed to copy the image folder and one level of files below.

Wayne Workman

@george1421 said:

I suspect that the replicator was only designed to copy the image folder and one level of files below.

That’s likely the case.

You could just put them in the web directory and grab them via wget on the hosts…

george1421

My preference would be to not do something out of band if possible. It does appear that creating a fake image with its path set to /image/drivers is choking the FOG replicator because of the sub folders, so I’m going to back out that change. Because no replication is happening because of that error.

I haven’t dug into the fog replicator code yet, but I’m wondering if rsync wouldn’t be a better method to replicate the images from the master node to the other storage nodes. Rsync would give us a few more advanced options like data compression and only syncing files that were changed than just a normal file copy.

george1421

Following up on this in the AM, I now see all of the ‘files’ that were in /images/drivers on the master node now on the storage node. The FOG replicator only appears to copy just the files under the faux drivers image. What is missing now are the sub folders and files under /images/drivers (the actual driver files we push to the target just after imaging). So the idea to create a fake image kind of worked but not the way I needed it. As long as your files are located one level below the fake image folder then this is a workable solution.

Being very self centered I would like to see FOG support something like rsync to sync/manage the images folder, especially if the storage node is located across a slow WAN connection because these image files tend to be very large and we could benefit from the transmittal data compression intelligent replication such a tool could provide.

Joseph Hales

As the images are already compressed I’m not sure if rsync with compression would be of any benefit or might actually make copy times worse.

george1421

@Joseph-Hales said:

As the images are already compressed I’m not sure if rsync with compression would be of any benefit or might actually make copy times worse.

Testing shows that data compression “does work” but the actual amount of compression does not add any value in a WAN transfer. I took a 4.8GB captured image and gzip’d it. The resultant image was only 50MB smaller than the source image (not much to really matter). The but the other benefits of rsync would still be worth looking into for my project.

Wayne Workman

@george1421 can rsync be used without additional compression?

george1421

Yes it can. The image files appear to be packed pretty good as they are.

I need to check a bit more into the fog replicator. I noticed that if I restarted the fog replicator it goes through and starts replicating all of the files again, with 100% cpu on the master node. Almost like the replicator service was in a tight do loop until the file finished copying over. This is not very kind behavior, which is making me think that I should just go the rsync route and just disable the fog replicator service all together. Understand at this point I don’t have documented evidence that the fog replicator is at fault, only what I saw just before calling it a week.

Wayne Workman

@george1421 I’ve only had the opportunity to mess with the fog image replicator a handful of times in remote sessions with others. I can’t say I have seen this behavior but I’m not denying it either.

If you are able to configure rsync to properly follow all the settings set in the DB for replication and are able to document how to set it up, I wouldn’t be surprised if it was adopted.

There are a lot of settings, by the way… @Tom-Elliott explains it best… but from what I gather:

The master node replicates to all nodes in it’s storage group.
An image can belong to several storage groups. When this is the case, the master that has the image replicates to the other master. From there, the above step applies.
rsync must use the settings defined in /opt/fog/.fogsettings for the ftp credentials
rsync must use the replication bandwidth limitations set in the database
rsync must not re-compress images nor change files in any way.

george1421

Understand I’m not knocking the way it is today. (Actually from another recent thread it gave me an idea how to change this multi site POC by doing full installs at each site then pointing them back to the master node for the database information). The 1.x version of FOG that Tom revived has been great. For my deployment I see two issues/concerns.

The current fog replicator only copies the files under the image name directory. This is perfect for image replication. But in my case I have something slightly different in that I have a driver structure in the images directory that I need to replicate to all storage nodes. I need all storage nodes to have all copies of all files from the master node.
On the surface I see some strange behavior with the fog replicator consuming 100% cpu. This may be related to having that faux devices image in the system or something else going on. I also noted in the fog replicator log that when I restarted the fog replicator service (when debugging the 100% cpu issue) that it started copying files that already existed on the storage node. i.e.

[10-23-15 1:24:43 pm]  * ncystorage - SubProcess -> Mirroring directory `WIN7PROSP1X86B03'
[10-23-15 1:24:43 pm]  * nycstorage - SubProcess -> Removing old file `WIN7PROSP1X86B03/d1p2.img'
[10-23-15 1:24:43 pm]  * nycstorage - SubProcess -> Transferring file `WIN7PROSP1X86B03/d1p2.img'

I will look into the rsync to see what commands it supports directly. It may be possible to stitch it into the current deployment by replacing the lftp calls (just a guess) with rsync calls. But you did give me a few ideals to check into and places to look for settings, thank you.

Tom Elliott

@george1421 I’m very confused.

Currently the command we run for this is:

lftp -e 'set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; [<bandwidth limits if any>] mirror -R --ignore-time [-i <image folders or files if group->group transfer>] -vvv --exclude 'dev/' --exclude 'ssl/' --exclude 'CA/' --delete-first <what we are sending> <destination removing if existing>; exit' -u <username>,<password> <ip of node> 2>&1

Basically we are sending all files recursively. I wonder if it’s just timing out as it’s sending.

All of the options can be seen here:

LFTP Man Page

Wayne Workman

@george1421 said:

(Actually from another recent thread it gave me an idea how to change this multi site POC by doing full installs at each site then pointing them back to the master node for the database information).

It was originally Tom’s idea. And it’s proven to work. I’ve just been spreading the word.

FOG storage node and data replication

240

12.3k

17.4k

155.8k