Replicating images to other FOG servers
-
I finally got a chance to put some more thought into this last week and consult with a few others involved. We’ve ultimately decided to go with the multi-master model. We’re basically going to just add the two “child” sites to the existing storage group of the “master” FOG server and handle the image definition import/export manually. If I’m absorbing all the previous info correctly, I think this will accomplish what we’re looking to do. I suspect this will be the easiest to implement without making major changes to our current FOG environment and this option most closely aligns with what our Helpdesk team is looking to accomplish with images.
That being the case, I’ve got a few questions specifically about replication:
1.) Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.
2.) How do you check the status of a running replication job?
3.) If possible, how do you stop/pause/resume a running replication job?Thanks again for all the help. Everyone’s responses and guidance have been extremely helpful!
-
@aparker said in Replicating images to other FOG servers:
That being the case, I’ve got a few questions specifically about replication:
1.) Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.What you need to do is have a cron job to manage the start and stop of the FOGImageReplicator service. Start and stop it at your command.
2.) How do you check the status of a running replication job?
Via the replicator log files in /opt/fog/log
3.) If possible, how do you stop/pause/resume a running replication job?
This is a problem. Yes you can stop a replication session. The replicator will exit, but you could have a partial replicated (damaged) image since the replication happens on a file by file basis. You won’t really know its broken until you try to deploy. No you can’t pause/resume. Its replicated on a file by file basis. If the whole file didn’t get there last time it will resend the file.
-
-
Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.
There’s a couple of ways to handle this. The simplest method would be to set a cron job that starts the FOGImageReplicator and FOGSnapinRepicator services say on saturday morning, and another set that stops the services Sunday evening (or whenever you feel it should stop.) The next method requires a bit more involvement, but should work without needing to worry about a cron job that is handling things. Set the “Timeout” to: 604800 seconds for the Image and Snapin replication cycles. This method is bound to have some problems, but essentially will only make the images replicate once a week, dependent on when the service is first started. So if you start the services on Monday, next Monday will be the time they re-check. This is why I suggest using the cron cycle as you can ensure it only occurs on Saturday/Sunday as needed. Mind you, however, images and snapins check if they need to be updated, so while there is a little bit of network chatter during the checking phase, it’s much less bandwidth usage than full on replications. -
How do you check the status of a running replication job?
Replication is tracked by the relevant service. You can look in the replication logs and it will show you if the replication is still working (when it started it last vs the next check cycle). It will tell you the PIDs of the replication being performed. It should not try replicating anything if the tasking is already running during its check. -
If possible, how do you stop/pause/resume a running replication job?
There really isn’t a way, unless you stop a replication from the process that’s actually performing the work and even then this isn’t guaranteed to work as you might expect. The processes do attempt to “recover” on their own, but the way the replication elements work it’s simpler, often times, to actually remove the remote file and transfer the whole file from scratch.
-
-
What you need to do is have a cron job to manage the start and stop of the FOGImageReplicator service. Start and stop it at your command.
Understood, I’ll explore this option. Any good references for creating/managing cron jobs in Ubuntu? While I’m aware of what cron is, I’ve never used it.
But just so I understand this process better conceptually, what is the default behavior? Does FOG call this FOGImageReplicator service on it’s own at a predefined interval? Is there anything I need to do to keep that from happening?
2.) How do you check the status of a running replication job?
Via the replicator log files in /opt/fog/log
Perfect, thanks.
3.) If possible, how do you stop/pause/resume a running replication job?
This is a problem. Yes you can stop a replication session. The replicator will exit, but you could have a partial replicated (damaged) image since the replication happens on a file by file basis. You won’t really know its broken until you try to deploy. No you can’t pause/resume. Its replicated on a file by file basis. If the whole file didn’t get there last time it will resend the file.This is about what I expected honestly, but figured it was worth asking.
-
@Tom-Elliott said in Replicating images to other FOG servers:
- Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.
There’s a couple of ways to handle this. The simplest method would be to set a cron job that starts the FOGImageReplicator and FOGSnapinRepicator services say on saturday morning, and another set that stops the services Sunday evening (or whenever you feel it should stop.) The next method requires a bit more involvement, but should work without needing to worry about a cron job that is handling things. Set the “Timeout” to: 604800 seconds for the Image and Snapin replication cycles. This method is bound to have some problems, but essentially will only make the images replicate once a week, dependent on when the service is first started. So if you start the services on Monday, next Monday will be the time they re-check. This is why I suggest using the cron cycle as you can ensure it only occurs on Saturday/Sunday as needed. Mind you, however, images and snapins check if they need to be updated, so while there is a little bit of network chatter during the checking phase, it’s much less bandwidth usage than full on replications.
Great, I think this mostly answers part of my follow-up question above actually. I see now after just looking at the output of a “top” command that these services are always running. The logic of starting/stopping them as needed via cron makes sense.
I am still curious about default behavior though just for reference. If we did absolutely nothing with regards to scheduling, how frequently does replication occur?
- How do you check the status of a running replication job?
Replication is tracked by the relevant service. You can look in the replication logs and it will show you if the replication is still working (when it started it last vs the next check cycle). It will tell you the PIDs of the replication being performed. It should not try replicating anything if the tasking is already running during its check.
Great info, thanks.
- If possible, how do you stop/pause/resume a running replication job?
There really isn’t a way, unless you stop a replication from the process that’s actually performing the work and even then this isn’t guaranteed to work as you might expect. The processes do attempt to “recover” on their own, but the way the replication elements work it’s simpler, often times, to actually remove the remote file and transfer the whole file from scratch.
As I mentioned above, pretty much what I expected, but figured it couldn’t hurt to ask.
- Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.
-
@aparker The default replication cycle for Image and Snapin replication is 600 seconds (10 Minutes).
Mind you it checks all files available, if they should be replicated. When the last file replicates/checks is when the cycle time starts. This setting is defined in:
FOG Configuration Page->FOG Settings->FOG Linux Service Sleep Times
-
@Tom-Elliott
Fantastic, thanks!We’re scheduled to make these changes this weekend, so we’ll see how it goes. Might be back Monday with more questions.
-
@george1421 said in Replicating images to other FOG servers:
This is a problem. Yes you can stop a replication session. The replicator will exit, but you could have a partial replicated (damaged) image since the replication happens on a file by file basis. You won’t really know its broken until you try to deploy. No you can’t pause/resume. Its replicated on a file by file basis. If the whole file didn’t get there last time it will resend the file.
@Tom-Elliott Does image replication send the image files to /images/dev first and once completed, move them to /images ?
-
@aparker said in Replicating images to other FOG servers:
2.) How do you check the status of a running replication job?
You can view the logs right in the web interface in addition to viewing them at the OS level.
Web Gui -> FOG Configuration -> Log Viewer -> Pick your log
-
@Wayne-Workman No.
-
A few quick questions about actually adding the nodes:
1.) On the “Add Storage Node” page, is the “Interface” field asking for the interface of the storage node you are adding or the interface for the FOG server you are adding the node to?
2.) Similar question, but this one I think I already know the answer to, but want to clarify: Is the Management Username/Password asking for the node you are adding? -
@aparker Interface = the interface the node that’s being configured uses. (This is only used for bandwidth page anyway so not really important).
Yes, the management username/password is the linux user and password fog creates for FTP uses.