Replicating images to other FOG servers

aparker

Our organization is currently running three completely independent, normal FOG servers in three geographically separate locations. They have operated independently for several months, but we’d like to unify the images across all FOG servers. We’re looking to designate one FOG server as the “master” FOG server. Users of FOG in this location will perform all image capture tasks. These captured images will then be replicated on a schedule to to the two other FOG servers, where users of FOG in these locations will image computers with these images.

From my research, it appears this is possible via storage nodes and the location plug-in (summarized very well here). What is described in that wiki article is essentially what I’m looking to accomplish. I’d like a few points of clarifications though:

1.) All three existing FOG servers are “normal” installs as opposed to “storage” installs. Can I make this work as-is? Or do they need to be converted in some way to a storage install?
2.) What’s the storage node setup look like? I haven’t been able to find much documentation on what exactly needs to be done here. It appears that each FOG server has a “default” storage group and a “DefaultMember” storage node. Would I just add the two downstream FOG servers as storage nodes?
3.) We want to make sure that no capture or image traffic goes across the WAN. This is indeed possible with the location plug-in, correct?

We’re currently running FOG 1.3.4. on all FOG servers (and I’m not opposed to upgrading if there is a compelling reason to do so). Any guidance would be appreciated!

Wayne Workman

1.) All three existing FOG servers are “normal” installs as opposed to “storage” installs. Can I make this work as-is? Or do they need to be converted in some way to a storage install?

Yes - but there will be some manual intervention required. This setup you are inquiring about is called a multi-master setup (search the forums for that), I try to discourage it because of the level of care and awareness this needs. In a multi-master setup, there are many databases instead of just one. You will need to manually add the ‘slave’ boxes to the master box as storage nodes. To make the images that the ‘master’ box replicates out usable at the other sites, an image definition needs created on those other boxes that has identical settings to the original ones on the original ‘master’. After all the servers are added to the ‘master’ as nodes, You could put all servers in one big group - I would not advise this for your setup. What I would advise is making a storage group for each server, making all of them masters of their own groups - and then just share the image you want replicated with the other groups. With this setup, individual images can be chosen for sharing instead of forcing all images from one box to the others. Naturally, each site is going to have that one-off box that it needs an image for. having all servers in their own groups as masters enables each site to still be able to have it’s own images uploaded to it’s own server - while also selectively replicating specific images. Image sharing is done per-image inside image management of the ‘master’.

2.) What’s the storage node setup look like? I haven’t been able to find much documentation on what exactly needs to be done here. It appears that each FOG server has a “default” storage group and a “DefaultMember” storage node. Would I just add the two downstream FOG servers as storage nodes?

See what I said above. The names don’t matter really, I’d recommend you name the groups Group-blah and nodes Node-blah, where blah is the site name.

3.) We want to make sure that no capture or image traffic goes across the WAN. This is indeed possible with the location plug-in, correct?

No, the location plugin does not alter the behavior of captures. All image captures always get uploaded to the master in the storage group that the image was made in. The way to not have capture traffic go across the WAN is by having a storage group per-site, and making each server a master of its group, and then sharing images as needed. On this topic though, image capture traffic is nowhere near as demanding as image deployment traffic. Capture traffic is a bunch of little bursts of transfer, whereas deployment is this huge continuous stream.

We’re currently running FOG 1.3.4. on all FOG servers (and I’m not opposed to upgrading if there is a compelling reason to do so)

I’d strongly recommend 1.4.1 - you’ll be able to take advantage of ZSTD compression - and numerous bug fixes. See the announcements area for all the bug fixes since 1.3.4 here: https://forums.fogproject.org/category/22/announcements

I understand you’re asking for help - and I know I’ve used a lot of terms and setups loosely here without any details on how to do any of it. If you need clarification or help in whatever direction you choose - we can help with this easily. All the things you’ve asked here are configuration questions that any of our @Moderators or @Developers can answer pretty quickly. These are the sorts of questions we can help with easily, so do ask if you have questions.

george1421

@aparker said in Replicating images to other FOG servers:

1.) All three existing FOG servers are “normal” installs as opposed to “storage” installs. Can I make this work as-is? Or do they need to be converted in some way to a storage install?

We have our environment setup similar to this. We have a dev environment with a full FOG server and then a production environment with full FOG server and also storage nodes. The dev fog server replicates to the production FOG server similar to how a full FOG server replicates to a storage node. The difference in this dev -> production setup is that each full fog server has its own sql database, where a full fog server and storage node setup only has one database on the full fog server.

The manual bit that Wayne talked about is that you need to manually export the image definitions from your source fog server and import them into your destination fog server. As long as you don’t add new images to your root fog server then you only need to do this one. You can update the image files and they will replicate as long as they use the same image definitions you don’t need to touch anything. A future release of FOG may automate this process, but for today its manual.

2.) What’s the storage node setup look like? I haven’t been able to find much documentation on what exactly needs to be done here. It appears that each FOG server has a “default” storage group and a “DefaultMember” storage node. Would I just add the two downstream FOG servers as storage nodes?

Short answer, yes. In this storage group (collection of fog servers) you will have one master node (root) and all other traditional FOG servers as slaves.

3.) We want to make sure that no capture or image traffic goes across the WAN. This is indeed possible with the location plug-in, correct?

Correct you will create locations, assign fog servers to the location as well as assign the target computers to that location. During registration you will select the location that the target will talk to for deployment. Captures always happen to the full fog server never storage nodes. This is normal. BUT in your situation you have 3 standalone FOG servers. The slaves as well as the root fog server really doesn’t know about each other. The replicator does. So in your setup nothing special needs to be done. Each site will have its own dhcp server pointing to its local FOG server. So no change here.

We’re currently running FOG 1.3.4. on all FOG servers (and I’m not opposed to upgrading if there is a compelling reason to do so).

Upgrading would be advised as always. There were a few annoying bugs in 1.3.4 (more so in 1.3.5) that was addressed in 1.4.0. In regards to your setup 1.3.4 will work fine for this image replication part.

aparker

Thanks Wayne and George, both extremely helpful responses. I’m largely following both of you, but I may still be a bit confused by the ramifications of captures based on each scenario, as the information about captures seems to contradict (or I’m missing a key point/distinction). I’ll come back to that in a minute, but let me see if I can summarize my options based on what you’ve both suggested:

Option 1
Add each FOG server to it’s own storage group, each becoming a master of it’s own storage group. Images would be shared with other groups as needed. Captures would not cross the WAN since captures are always uploaded to the master in the storage group

Questions:
1.) Isn’t this essentially what you end up with when you have three independent FOG installs? By default a FOG server is added as a node to it’s own storage group, correct? I could see needing to modify the names, but isn’t the infrastructure essentially already in place?
2.) Via what mechanism is an image shared ? You mentioned it was done per-image inside image management on the master, but I honestly don’t recall seeing any verbiage related to image sharing.
3.) Does the act of sharing an image to a storage group cause that image to be replicated to that/those group(s)?
4.) In this scenario, would replication be happening in multiple directions, depending on where (or what storage group) the image was shared from?
5.) This method would not require manually creating image definitions, correct?
6.) Is the location plug-in even required for this particular setup? If I’m understanding correctly, since FOG always looks to the master node for both capturing and imaging, no FOG traffic should traverse the WAN.

Option 2
Add the two slave FOG servers as storage nodes to a single storage group (the existing storage group on the master FOG server). This would result in a single storage group containing all three FOG servers. Image definitions would need to be exported from the master server and imported to the slave servers manually (and whenever a new image is added to the master FOG server). The location plug-in would be required to ensure capturing and imaging occur from the correct FOG server.

Questions:
1.) What does the image definition export/import process look like? How is that done?
2.) How does FOG know which server is the “master” since these are all normal installs and all FOG servers are in the same storage group?
3.) Wayne, you mentioned that the location-plugin doesn’t alter the behavior of captures. George indicated that the location plug-in would help determine which FOG server would be used. Am I missing something? I will say the location plug-in documentation does seem to indicate that it does change the behavior of captures. Under the Catpures–>With Location Plugin section it indicates that it fill find the locations group master, then use it to perform the capture (but maybe not store in the right location?). I have a feeling this isn’t actually contradictory and I’m missing some important distinction.

All told, I think I’m leaning towards Option 1. I like that it doesn’t appear to require manually updating image definitions. The flexibility also seems nice.

Lastly, I’ll throw out one other possible option–I could probably swing building my environment again from scratch if that is truly the best option. Would there be any advantage to truly starting over and moving to a model with one true master FOG server with two storage FOG servers?

Wayne Workman

1.) Isn’t this essentially what you end up with when you have three independent FOG installs? By default a FOG server is added as a node to it’s own storage group, correct? I could see needing to modify the names, but isn’t the infrastructure essentially already in place?

The difference between independent setups and a standard fog setup is multiple databases vs 1 database. With a multi-master setup, the location plugin isn’t important because there’s division by design. If you only have one real master, then you will need the location plugin for your goals.

2.) Via what mechanism is an image shared ? You mentioned it was done per-image inside image management on the master, but I honestly don’t recall seeing any verbiage related to image sharing.

FTP. Specifically, the FOGImageReplicator service. There are settings for this. You can control bandwidth usage in storage management, and can control frequency in fog settings.

3.) Does the act of sharing an image to a storage group cause that image to be replicated to that/those group(s)?

Yes, on the next replication cycle. Once replication is done, the FOGImageReplicator only checks to see if things have changed. If an image has changed, it replicates that image again.

4.) In this scenario, would replication be happening in multiple directions, depending on where (or what storage group) the image was shared from?

Correct, this is more flexible.

5.) This method would not require manually creating image definitions, correct?

Depends on if you have a multi-master setup or not. Even if you did go with a multi-master setup, you can export/import the definitions via the web gui.

6.) Is the location plug-in even required for this particular setup? If I’m understanding correctly, since FOG always looks to the master node for both capturing and imaging, no FOG traffic should traverse the WAN.

Depends on if you go with multi-master or not. If you do, no the location plugin isn’t needed. If you go with a standard setup, yes the location plugin is needed for your goals.

Wayne Workman

@aparker said in Replicating images to other FOG servers:

3.) Wayne, you mentioned that the location-plugin doesn’t alter the behavior of captures. George indicated that the location plug-in would help determine which FOG server would be used. Am I missing something? I will say the location plug-in documentation does seem to indicate that it does change the behavior of captures. Under the Catpures–>With Location Plugin section it indicates that it fill find the locations group master, then use it to perform the capture (but maybe not store in the right location?). I have a feeling this isn’t actually contradictory and I’m missing some important distinction.

Group to group image sharing affects how captures work a bit, The location plugin has no impact on captures at all. If an image is shared from group A to group B, and then someone over at site B re-captures B to the B server, A will just overwrite B because A is set as the ‘primary’ group at this point. Also, I wrote that documentation.

@aparker said in Replicating images to other FOG servers:

Lastly, I’ll throw out one other possible option–I could probably swing building my environment again from scratch if that is truly the best option. Would there be any advantage to truly starting over and moving to a model with one true master FOG server with two storage FOG servers?

The most important thing is host registrations, IMO. You can export all of those, and import them into one server.

You may also find this relevant: https://wiki.fogproject.org/wiki/index.php?title=Migrate_FOG

Tom Elliott

@aparker Answers:

Option 1

Yes. Verbiage -> We don’t have direct wording saying “This is how to share an image”. You’ll see why soon hopefully.
Images are “shared” via Storage groups. An image can be associated to more than one storage group. This is how FOG knows to replicate an image to other Storage Groups or in other words the master nodes of the storage group. This is also the reason we have a “Primary” master setting. This tells the replication service(s) which Group’s Master node is the one you want ALL Storage groups to be based off of.
Yes, and no. As long as Replication is enabled on the image this will indeed occur. You can disable this element if you so choose.
When replicating across groups, images are only sent to other storage group master nodes.
Yes. This could be somewhat automated, but FOG does not provide this automation.
No. The location plugin would not be necessary in this situation. Imaging only requires the “Master” node when capturing an image. It will use a “load balancing” type setup when deploy is in use. That said, only if the layout is proper will it not try imaging over the WAN. You SHOULD use the location plugin if you need to keep the network traffic from going across the wan. Each location, then, should have a storage group and that storage group should limit the scope of where the devices will get their images from.

Option 2

You can export image definitions by going to the Image management page and clicking “Export Images” You can import image definitions by using the same formatted CSV, going to Image management page and clicking Import Images.
The “Primary Master” option. This only applies to “Storage Groups”. When an image is on another storage group, all nodes within the group will automatically be replicated to.
When capturing images, FOG Always uses a Master node. Because it’s the master node who replicates down to the other nodes within the group. What George is referring to, most likely, is when you Deploy an image, the location plugin helps tell the hosts WHERE to get the image from.

The location plugin configuration items are as follows:

Name (Required) The name to give the location.
Storage Group (Required) this will be associated with.
Storage Node (Optional) this will define what node the host must use.
Kernels and Inits from location… (Optional) If checked, it will tell the booting host to download its kernels and init’s from the location being used rather than the “PXE Server” giving the information.

What does this mean? Well if you don’t define a storage node in the location, it will limit the host to use any node within the associated storage group for that location. (Load balancing kind of?) If you do define the node, it will only allow deploy’s to occur for that particular storage node.

Option 1 or 2 will work. Be aware, however, the Option 1 will ONLY not require updating/inserting images IF the other FOG Server’s are set to look at the same database server. (See below why I say this)

I would recommend rethinking your own wording here. “two storage FOG Servers”, as I’m gathering it, means, Two storage nodes? FOG Servers, in my head, seems to indicate they will have their own databases.

Option 2, from other’s descriptions and what i think had become the “norm” now is what’s known as a “Multi master” setup. Option 1 is essentially default operation.

aparker

I finally got a chance to put some more thought into this last week and consult with a few others involved. We’ve ultimately decided to go with the multi-master model. We’re basically going to just add the two “child” sites to the existing storage group of the “master” FOG server and handle the image definition import/export manually. If I’m absorbing all the previous info correctly, I think this will accomplish what we’re looking to do. I suspect this will be the easiest to implement without making major changes to our current FOG environment and this option most closely aligns with what our Helpdesk team is looking to accomplish with images.

That being the case, I’ve got a few questions specifically about replication:
1.) Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.
2.) How do you check the status of a running replication job?
3.) If possible, how do you stop/pause/resume a running replication job?

Thanks again for all the help. Everyone’s responses and guidance have been extremely helpful!

george1421

@aparker said in Replicating images to other FOG servers:

That being the case, I’ve got a few questions specifically about replication:
1.) Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.

What you need to do is have a cron job to manage the start and stop of the FOGImageReplicator service. Start and stop it at your command.

2.) How do you check the status of a running replication job?

Via the replicator log files in /opt/fog/log

3.) If possible, how do you stop/pause/resume a running replication job?

This is a problem. Yes you can stop a replication session. The replicator will exit, but you could have a partial replicated (damaged) image since the replication happens on a file by file basis. You won’t really know its broken until you try to deploy. No you can’t pause/resume. Its replicated on a file by file basis. If the whole file didn’t get there last time it will resend the file.

Tom Elliott

@aparker

Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.
There’s a couple of ways to handle this. The simplest method would be to set a cron job that starts the FOGImageReplicator and FOGSnapinRepicator services say on saturday morning, and another set that stops the services Sunday evening (or whenever you feel it should stop.) The next method requires a bit more involvement, but should work without needing to worry about a cron job that is handling things. Set the “Timeout” to: 604800 seconds for the Image and Snapin replication cycles. This method is bound to have some problems, but essentially will only make the images replicate once a week, dependent on when the service is first started. So if you start the services on Monday, next Monday will be the time they re-check. This is why I suggest using the cron cycle as you can ensure it only occurs on Saturday/Sunday as needed. Mind you, however, images and snapins check if they need to be updated, so while there is a little bit of network chatter during the checking phase, it’s much less bandwidth usage than full on replications.
How do you check the status of a running replication job?
Replication is tracked by the relevant service. You can look in the replication logs and it will show you if the replication is still working (when it started it last vs the next check cycle). It will tell you the PIDs of the replication being performed. It should not try replicating anything if the tasking is already running during its check.
If possible, how do you stop/pause/resume a running replication job?
There really isn’t a way, unless you stop a replication from the process that’s actually performing the work and even then this isn’t guaranteed to work as you might expect. The processes do attempt to “recover” on their own, but the way the replication elements work it’s simpler, often times, to actually remove the remote file and transfer the whole file from scratch.

aparker

What you need to do is have a cron job to manage the start and stop of the FOGImageReplicator service. Start and stop it at your command.

Understood, I’ll explore this option. Any good references for creating/managing cron jobs in Ubuntu? While I’m aware of what cron is, I’ve never used it.

But just so I understand this process better conceptually, what is the default behavior? Does FOG call this FOGImageReplicator service on it’s own at a predefined interval? Is there anything I need to do to keep that from happening?

2.) How do you check the status of a running replication job?

Via the replicator log files in /opt/fog/log

Perfect, thanks.

3.) If possible, how do you stop/pause/resume a running replication job?
This is a problem. Yes you can stop a replication session. The replicator will exit, but you could have a partial replicated (damaged) image since the replication happens on a file by file basis. You won’t really know its broken until you try to deploy. No you can’t pause/resume. Its replicated on a file by file basis. If the whole file didn’t get there last time it will resend the file.

This is about what I expected honestly, but figured it was worth asking.

aparker

@Tom-Elliott said in Replicating images to other FOG servers:

@aparker

Is it possible to set the frequency and start time of replication? Ideally we’d likely want it to only run once a week, preferably over the weekend.
There’s a couple of ways to handle this. The simplest method would be to set a cron job that starts the FOGImageReplicator and FOGSnapinRepicator services say on saturday morning, and another set that stops the services Sunday evening (or whenever you feel it should stop.) The next method requires a bit more involvement, but should work without needing to worry about a cron job that is handling things. Set the “Timeout” to: 604800 seconds for the Image and Snapin replication cycles. This method is bound to have some problems, but essentially will only make the images replicate once a week, dependent on when the service is first started. So if you start the services on Monday, next Monday will be the time they re-check. This is why I suggest using the cron cycle as you can ensure it only occurs on Saturday/Sunday as needed. Mind you, however, images and snapins check if they need to be updated, so while there is a little bit of network chatter during the checking phase, it’s much less bandwidth usage than full on replications.

Great, I think this mostly answers part of my follow-up question above actually. I see now after just looking at the output of a “top” command that these services are always running. The logic of starting/stopping them as needed via cron makes sense.

I am still curious about default behavior though just for reference. If we did absolutely nothing with regards to scheduling, how frequently does replication occur?

How do you check the status of a running replication job?
Replication is tracked by the relevant service. You can look in the replication logs and it will show you if the replication is still working (when it started it last vs the next check cycle). It will tell you the PIDs of the replication being performed. It should not try replicating anything if the tasking is already running during its check.

Great info, thanks.

If possible, how do you stop/pause/resume a running replication job?
There really isn’t a way, unless you stop a replication from the process that’s actually performing the work and even then this isn’t guaranteed to work as you might expect. The processes do attempt to “recover” on their own, but the way the replication elements work it’s simpler, often times, to actually remove the remote file and transfer the whole file from scratch.

As I mentioned above, pretty much what I expected, but figured it couldn’t hurt to ask.

Tom Elliott

@aparker The default replication cycle for Image and Snapin replication is 600 seconds (10 Minutes).

Mind you it checks all files available, if they should be replicated. When the last file replicates/checks is when the cycle time starts. This setting is defined in:

FOG Configuration Page->FOG Settings->FOG Linux Service Sleep Times

aparker

@Tom-Elliott
Fantastic, thanks!

We’re scheduled to make these changes this weekend, so we’ll see how it goes. Might be back Monday with more questions.

Wayne Workman

@george1421 said in Replicating images to other FOG servers:

This is a problem. Yes you can stop a replication session. The replicator will exit, but you could have a partial replicated (damaged) image since the replication happens on a file by file basis. You won’t really know its broken until you try to deploy. No you can’t pause/resume. Its replicated on a file by file basis. If the whole file didn’t get there last time it will resend the file.

@Tom-Elliott Does image replication send the image files to /images/dev first and once completed, move them to /images ?

Wayne Workman

@aparker said in Replicating images to other FOG servers:

2.) How do you check the status of a running replication job?

You can view the logs right in the web interface in addition to viewing them at the OS level. Web Gui -> FOG Configuration -> Log Viewer -> Pick your log

Tom Elliott

@Wayne-Workman No.

aparker

A few quick questions about actually adding the nodes:

1.) On the “Add Storage Node” page, is the “Interface” field asking for the interface of the storage node you are adding or the interface for the FOG server you are adding the node to?
2.) Similar question, but this one I think I already know the answer to, but want to clarify: Is the Management Username/Password asking for the node you are adding?

Tom Elliott

@aparker Interface = the interface the node that’s being configured uses. (This is only used for bandwidth page anyway so not really important).
Yes, the management username/password is the linux user and password fog creates for FTP uses.

Replicating images to other FOG servers

128

12.3k

17.4k

155.8k