Replicating images to other FOG servers

Tom Elliott

Option 1

Yes. Verbiage -> We don’t have direct wording saying “This is how to share an image”. You’ll see why soon hopefully.
Images are “shared” via Storage groups. An image can be associated to more than one storage group. This is how FOG knows to replicate an image to other Storage Groups or in other words the master nodes of the storage group. This is also the reason we have a “Primary” master setting. This tells the replication service(s) which Group’s Master node is the one you want ALL Storage groups to be based off of.
Yes, and no. As long as Replication is enabled on the image this will indeed occur. You can disable this element if you so choose.
When replicating across groups, images are only sent to other storage group master nodes.
Yes. This could be somewhat automated, but FOG does not provide this automation.
No. The location plugin would not be necessary in this situation. Imaging only requires the “Master” node when capturing an image. It will use a “load balancing” type setup when deploy is in use. That said, only if the layout is proper will it not try imaging over the WAN. You SHOULD use the location plugin if you need to keep the network traffic from going across the wan. Each location, then, should have a storage group and that storage group should limit the scope of where the devices will get their images from.

Option 2

You can export image definitions by going to the Image management page and clicking “Export Images” You can import image definitions by using the same formatted CSV, going to Image management page and clicking Import Images.
The “Primary Master” option. This only applies to “Storage Groups”. When an image is on another storage group, all nodes within the group will automatically be replicated to.
When capturing images, FOG Always uses a Master node. Because it’s the master node who replicates down to the other nodes within the group. What George is referring to, most likely, is when you Deploy an image, the location plugin helps tell the hosts WHERE to get the image from.

The location plugin configuration items are as follows:

Name (Required) The name to give the location.
Storage Group (Required) this will be associated with.
Storage Node (Optional) this will define what node the host must use.
Kernels and Inits from location… (Optional) If checked, it will tell the booting host to download its kernels and init’s from the location being used rather than the “PXE Server” giving the information.

What does this mean? Well if you don’t define a storage node in the location, it will limit the host to use any node within the associated storage group for that location. (Load balancing kind of?) If you do define the node, it will only allow deploy’s to occur for that particular storage node.

Option 1 or 2 will work. Be aware, however, the Option 1 will ONLY not require updating/inserting images IF the other FOG Server’s are set to look at the same database server. (See below why I say this)

I would recommend rethinking your own wording here. “two storage FOG Servers”, as I’m gathering it, means, Two storage nodes? FOG Servers, in my head, seems to indicate they will have their own databases.

Option 2, from other’s descriptions and what i think had become the “norm” now is what’s known as a “Multi master” setup. Option 1 is essentially default operation.

aparker

I finally got a chance to put some more thought into this last week and consult with a few others involved. We’ve ultimately decided to go with the multi-master model. We’re basically going to just add the two “child” sites to the existing storage group of the “master” FOG server and handle the image definition import/export manually. If I’m absorbing all the previous info correctly, I think this will accomplish what we’re looking to do. I suspect this will be the easiest to implement without making major changes to our current FOG environment and this option most closely aligns with what our Helpdesk team is looking to accomplish with images.

That being the case, I’ve got a few questions specifically about replication:
1.) Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.
2.) How do you check the status of a running replication job?
3.) If possible, how do you stop/pause/resume a running replication job?

Thanks again for all the help. Everyone’s responses and guidance have been extremely helpful!

george1421

@aparker said in Replicating images to other FOG servers:

That being the case, I’ve got a few questions specifically about replication:
1.) Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.

What you need to do is have a cron job to manage the start and stop of the FOGImageReplicator service. Start and stop it at your command.

2.) How do you check the status of a running replication job?

Via the replicator log files in /opt/fog/log

3.) If possible, how do you stop/pause/resume a running replication job?

This is a problem. Yes you can stop a replication session. The replicator will exit, but you could have a partial replicated (damaged) image since the replication happens on a file by file basis. You won’t really know its broken until you try to deploy. No you can’t pause/resume. Its replicated on a file by file basis. If the whole file didn’t get there last time it will resend the file.

Tom Elliott

@aparker

Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.
There’s a couple of ways to handle this. The simplest method would be to set a cron job that starts the FOGImageReplicator and FOGSnapinRepicator services say on saturday morning, and another set that stops the services Sunday evening (or whenever you feel it should stop.) The next method requires a bit more involvement, but should work without needing to worry about a cron job that is handling things. Set the “Timeout” to: 604800 seconds for the Image and Snapin replication cycles. This method is bound to have some problems, but essentially will only make the images replicate once a week, dependent on when the service is first started. So if you start the services on Monday, next Monday will be the time they re-check. This is why I suggest using the cron cycle as you can ensure it only occurs on Saturday/Sunday as needed. Mind you, however, images and snapins check if they need to be updated, so while there is a little bit of network chatter during the checking phase, it’s much less bandwidth usage than full on replications.
How do you check the status of a running replication job?
Replication is tracked by the relevant service. You can look in the replication logs and it will show you if the replication is still working (when it started it last vs the next check cycle). It will tell you the PIDs of the replication being performed. It should not try replicating anything if the tasking is already running during its check.
If possible, how do you stop/pause/resume a running replication job?
There really isn’t a way, unless you stop a replication from the process that’s actually performing the work and even then this isn’t guaranteed to work as you might expect. The processes do attempt to “recover” on their own, but the way the replication elements work it’s simpler, often times, to actually remove the remote file and transfer the whole file from scratch.

aparker

What you need to do is have a cron job to manage the start and stop of the FOGImageReplicator service. Start and stop it at your command.

Understood, I’ll explore this option. Any good references for creating/managing cron jobs in Ubuntu? While I’m aware of what cron is, I’ve never used it.

But just so I understand this process better conceptually, what is the default behavior? Does FOG call this FOGImageReplicator service on it’s own at a predefined interval? Is there anything I need to do to keep that from happening?

2.) How do you check the status of a running replication job?

Via the replicator log files in /opt/fog/log

Perfect, thanks.

3.) If possible, how do you stop/pause/resume a running replication job?
This is a problem. Yes you can stop a replication session. The replicator will exit, but you could have a partial replicated (damaged) image since the replication happens on a file by file basis. You won’t really know its broken until you try to deploy. No you can’t pause/resume. Its replicated on a file by file basis. If the whole file didn’t get there last time it will resend the file.

This is about what I expected honestly, but figured it was worth asking.

aparker

@Tom-Elliott said in Replicating images to other FOG servers:

@aparker

Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.
There’s a couple of ways to handle this. The simplest method would be to set a cron job that starts the FOGImageReplicator and FOGSnapinRepicator services say on saturday morning, and another set that stops the services Sunday evening (or whenever you feel it should stop.) The next method requires a bit more involvement, but should work without needing to worry about a cron job that is handling things. Set the “Timeout” to: 604800 seconds for the Image and Snapin replication cycles. This method is bound to have some problems, but essentially will only make the images replicate once a week, dependent on when the service is first started. So if you start the services on Monday, next Monday will be the time they re-check. This is why I suggest using the cron cycle as you can ensure it only occurs on Saturday/Sunday as needed. Mind you, however, images and snapins check if they need to be updated, so while there is a little bit of network chatter during the checking phase, it’s much less bandwidth usage than full on replications.

Great, I think this mostly answers part of my follow-up question above actually. I see now after just looking at the output of a “top” command that these services are always running. The logic of starting/stopping them as needed via cron makes sense.

I am still curious about default behavior though just for reference. If we did absolutely nothing with regards to scheduling, how frequently does replication occur?

How do you check the status of a running replication job?
Replication is tracked by the relevant service. You can look in the replication logs and it will show you if the replication is still working (when it started it last vs the next check cycle). It will tell you the PIDs of the replication being performed. It should not try replicating anything if the tasking is already running during its check.

Great info, thanks.

If possible, how do you stop/pause/resume a running replication job?
There really isn’t a way, unless you stop a replication from the process that’s actually performing the work and even then this isn’t guaranteed to work as you might expect. The processes do attempt to “recover” on their own, but the way the replication elements work it’s simpler, often times, to actually remove the remote file and transfer the whole file from scratch.

As I mentioned above, pretty much what I expected, but figured it couldn’t hurt to ask.

Tom Elliott

@aparker The default replication cycle for Image and Snapin replication is 600 seconds (10 Minutes).

Mind you it checks all files available, if they should be replicated. When the last file replicates/checks is when the cycle time starts. This setting is defined in:

FOG Configuration Page->FOG Settings->FOG Linux Service Sleep Times

aparker

@Tom-Elliott
Fantastic, thanks!

We’re scheduled to make these changes this weekend, so we’ll see how it goes. Might be back Monday with more questions.

Wayne Workman

@george1421 said in Replicating images to other FOG servers:

This is a problem. Yes you can stop a replication session. The replicator will exit, but you could have a partial replicated (damaged) image since the replication happens on a file by file basis. You won’t really know its broken until you try to deploy. No you can’t pause/resume. Its replicated on a file by file basis. If the whole file didn’t get there last time it will resend the file.

@Tom-Elliott Does image replication send the image files to /images/dev first and once completed, move them to /images ?

Wayne Workman

@aparker said in Replicating images to other FOG servers:

2.) How do you check the status of a running replication job?

You can view the logs right in the web interface in addition to viewing them at the OS level. Web Gui -> FOG Configuration -> Log Viewer -> Pick your log

Tom Elliott

@Wayne-Workman No.

aparker

A few quick questions about actually adding the nodes:

1.) On the “Add Storage Node” page, is the “Interface” field asking for the interface of the storage node you are adding or the interface for the FOG server you are adding the node to?
2.) Similar question, but this one I think I already know the answer to, but want to clarify: Is the Management Username/Password asking for the node you are adding?

Tom Elliott

@aparker Interface = the interface the node that’s being configured uses. (This is only used for bandwidth page anyway so not really important).
Yes, the management username/password is the linux user and password fog creates for FTP uses.

Replicating images to other FOG servers

130

12.2k

17.4k

155.5k