FOG storage node and data replication

george1421 · Oct 23, 2015, 8:55 PM

Here is the command I’m working with so far.

rsync -arvPh --bwlimit=5000 /images fog@192.168.1.56:/images/

Where:
-a == archive
-r == recursive
-v == verbose (gives file names as it transfers, for my benefit)
-P == show progress stats (for my interactive benefit)
-h == display numbers in a human-readable format (for my interactive benefit)
–bwlimit == bandwidth limitations in KB/s

I’m going to let it run overnight to see where I end up.

One interesting thing I found with rsync is that it runs in restartable mode. I stopped the transfer of a image mid stream. When I restarted the command it thought for a bit and then started the transfer where I broke the connection.

It looks like from Tom’s post that I may need to include the --exclude switch to exclude files from being copied over.

Some caveats so far:

It appears that passing the password inline isn’t possible. Maybe able to get around with ssh keys.

Rsync must be installed on both the master and storage nodes for it to work correctly.

You can use a ssh tunnel to encrypt the data in motion between the master node and storage node if you need additional data protection. This may be of little value if you are transferring data inside your organization.

ref: http://www.tecmint.com/rsync-local-remote-file-synchronization-commands/

george1421 · Oct 23, 2015, 8:56 PM

@Wayne-Workman said:

@george1421 said:

(Actually from another recent thread it gave me an idea how to change this multi site POC by doing full installs at each site then pointing them back to the master node for the database information).

It was originally Tom’s idea. And it’s proven to work. I’ve just been spreading the word.

Well what I’m looking at is a multi site setup where each site would have a local storage node and a central master fog server at HQ. The idea is to start the deploy from the master node but have them deploy from the local storage nodes. But looking into a storage node a bit more it doesn’t look like the pxe environment is setup or the install isn’t complete (but I just did a few quick checks). But the idea of doing a full install at the remote sites but having them reference the master node’s database is brilliant. That way I have a full fog install at each site but only one database where everything exists.

If I can get the replication bits to work like I need, I think I’ll have a solid solution.

george1421 · Oct 24, 2015, 3:07 AM

Well I guess I just need to set it up and go away for the weekend.

I just ran out of disk space on the storage node. Looking to see where the space went I look into the drivers folder and the sub folders and driver files were there. So if I circle back to Wayne’s first comment to create a faux drivers image. Given enough time the system as is will replicate the drivers folder and all sub files and folders over to the storage node. That still doesn’t explain the 100% cpu usage of the fog replicator service. But the system does work as is.

Do I think the rsync method is better the ftp, yes. Do I think I can setup this POC system as is without much hassle, yes.

Wayne Workman · Oct 23, 2015, 9:44 PM

@george1421 I’m curious how you’re making the clients get said drivers from the storage nodes ? It’s exported as read-only via NFS and the other available option without any changes is FTP.

You could use a secured Samba share for this… There is a script that will do it for you on Fedora/CentOS here: https://forums.fogproject.org/topic/5145/script-to-install-samba-with-settings-for-fog

george1421 · Oct 23, 2015, 9:51 PM

@Wayne-Workman said:

@george1421 I’m curious how you’re making the clients get said drivers from the storage nodes ? It’s exported as read-only via NFS and the other available option without any changes is FTP.

Well that’s the bits I haven’t worked out yet. I needed to get the drivers to the storage node. On the master node today I’m running a post install script to copy the correct drivers to the target computer. It’s possible that I may not understand the concept of the storage node just yet. I may have to rethink my position. Without the files I can only guess.

But if I run a full install at the remote site that may address the driver deployment issue.

george1421 · Oct 24, 2015, 4:53 PM

After about 12 hours of running and the FOG Replicator service is still running at 100% utilization. It appears to be working as it should by moving files from the master node to the storage node. So it IS working, just with high CPU usage. I tried to poke around in the code a bit and add 20 second sleep statements to see if I could hit on where its looping uncontrolled (just a guess). I’m suspecting its somewhere after lftp is being launched and then it enters a task wait function which should wait until the lftp file copy is done. But from there I lost the trace (and btw I’m not a programmer only a good guesser).

I think I’ll need to leave this to the developers to take a peek at.

Wayne Workman · Oct 24, 2015, 6:54 PM

@george1421 Tom made some changes to replication. update and try again.

george1421 · Oct 25, 2015, 1:53 PM

I was able to update the system to SVN 4070 this AM.

The FOG Replicators service is behaving much better now. The CPU utilization is better at/about the same utilization as the lftp process, so well done Tom.

The image files are still syncing between the servers. One thing I did notice about the FOG replicator service is if you stop and restart the replicator several times, multiple lftp services are running. Based on this I assume that when the replicator is stopped, it doesn’t kill off the running lftp process. Not an issue under normal conditions, just an observation.

Wayne Workman · Oct 25, 2015, 9:57 AM

@george1421 said:

One thing I did notice about the FOG replicator service is if you stop and restart the replicator several times, multiple lftp services are running. Based on this I assume that when the replicator is stopped, it doesn’t kill off the running lftp process. Not an issue under normal conditions, just an observation.

I suppose if you were to manually stop and start the FOGImageReplicator, you’d also MANUALLY stop the lftp instances as well.

Tom Elliott · Oct 25, 2015, 3:51 PM

@Wayne-Workman I don’t know of a good way to handle the processes. If I could I would prefer to use the similar methods as what multicast task does. However, multicast task does not close the processes, I don’t think.

george1421 · Oct 25, 2015, 5:40 PM

While I can’t comment on the FOG code, at lot of systems will launch a process and then keep track of that process via a handle until it stops. In the destructor for the instances they will kill off the task based on the handle that was created when the process was launched of the application instance dies before the launched processes. I think the intent of the replicator was to have only one instance of the lftp process running at one time so it wouldn’t be too difficult to keep track of the process handle (as apposed to several hundred processes).

With the current design you normally wouldn’t have to start and stop the replicator multiple times, so having multiple instances of the lftp process running should never happen. I’m not seeing the value in putting energy into fixing a one off issue.

george1421 · Oct 25, 2015, 5:53 PM

While this thread has run on a bit I think a lot of great info has been covered.

I’m thinking I need to rebuild my whole POC setup because I think the function of the storage node is just for storage (just a guess at this time) and the master node is my current production instance (blindly upgrading svn version has disabled my production instance in the past). But that’s part of running on the bleeding edge.

What I need for this project is two (or more depending on scope creep) functioning FOG instances managed from a single master node. These may be connected across a VPN link or a MPLS link. At this time I’m not sure if FOG is the right tool. I’m not saying FOG is bad, just I’m trying to do something with it that it wasn’t really designed to do. With the idea of linking the master and secondary nodes via the database connection might prove to be problematic over a WAN connection with latency issues. Again this is just me thinking about things I’ve seen in the past on other projects. I do have the cpu bandwidth to spin up two new fog instances to setup the POC in a controlled manner without breaking our production system (which works really well).

Wayne Workman · Oct 25, 2015, 8:54 PM

@george1421 I don’t think you’ve yet to explain what you’re trying to do really… If you did that, we could probably give you several configuration options.

The VPN over WAN issues is easily solved by the location plugin, for instance… You’re just poking here and there trying to make things work but we don’t know what you’re trying to do really.

george1421 · Oct 25, 2015, 9:59 PM

Sorry, I got derailed because of the replication issues and didn’t fully describe me issue.

Here is a made up scenario (names changed to protect the innocent) based on a project that I’m working on.

Lets say we have 2 sites in New York [NY] and Georgia [GA]. The NY site is the HQ for the company. The windows image design team is at NY as well as the package developers. The NY site has what I will call the master FOG server. The GA site only has basic IT support staff. All images and snapins will be created by the NY IT teams. With plans on expanding in the near future to Kansas [KS] and Los Angles [LA] they want to setup a central deployment console (one way of doing things) that the remote sites can use to deploy images at their local site. Each site is connected to NY over a vpn link on the internet. While they have sufficient bandwidth on these VPN links they don’t want to deploy images across the VPN link.

Now as I see it I could do the following.

Setup a fully independent FOG server at each site. This server would be used to deploy images. Setup something like rsync to copy the master images and plugins from NY to each site. This would be the easiest to setup, but management would be a bit more because we would have to manually create settings in the remote systems as new images and snapins were created and replicated.
Setup a full FOG server at each site, but link the remote servers to the master server in NY using something like a master - slave setup. Since each site would have a full FOG server (sans database) they would have the tftp and pxe boot services there (something I feel is missing on a storage node). They (remote site admins) could use the master node to deploy images via their local FOG server, or some corporate weenie could deploy a new image to all lab computers in GA with a push of a button. I like this concept since all systems from all sites would be recorded in this single master node. We could write reports against this single node for documentation purposes. There are a number of issues I can see with this setup from database latency to the master node attempting to ping all devices to check to see if they are up, from the fog clients trying to mistakenly contact the master FOG node instead of their local FOG deployment server.

As I said, I don’t know of FOG is capable to do this since what I want was never in its core design. On the surface it appears to have the right bits in place to make this setup work. Looking back on my original post, I should have created a new one because the full picture is a bit broader than just a storage node syncing data with the master node.

Gilou · Oct 25, 2015, 6:27 PM

Hi,

I have a similar scenario, with images being built in one place, then transferred out to other independent FOG servers (NOT secondary sites). I do it rather manually, using rsync and mysqldump on images (something like mysqldump fog images --where ‘imageId IN (xx,xx,xx)’ with the corresponding images to pull down).

I’ve thought about integrating them into fog, but sometimes, we need to pull back images from other servers, and not all sites have the same kind of connectivity, so the transfers have to happen at different times of day… So well, I use my brain, and we do sync using rsync/mysqldump as required. I will probably end up with some file synchronization system… but rsync works nicely for this. And the “one way image pull” is easy to script.

One way (meaning you don’t care what’s on the slave, as far as images go):
on master: mysqldump fog images > /images/images.sql
on “slave”: rsync -avP fog@master:/images/ /images && mysql fog < /images/images.sql

I think it’s not too appropriate to use the storage node feature for this, unless we can indeed define a bandwidth limit and time for the sync to happen, which can be done using a script & crontab without touching FOG. And if you want independent servers (I do), that won’t do, you need a full master on each site.

EDIT: ah, I just saw Tom’s news https://news.fogproject.org/imagesnapin-replication/
Well. That could do it… I guess we need to be able to flag a “big master”, or maybe have the ability to set an image as shared wherever it comes from on “masters”.
Cheers,
Gilles

george1421 · Oct 26, 2015, 1:14 AM

@Gilou

Very nice feedback. I’m glad I’m not the only one trying to do this setup. From the testing that I’ve done I can say the fog replicator does work and it copied all files except what were included in the lftp command Tom posted. My (personal) preference would be to use the built in tools if available. I can say the bandwidth restrictions do work as defined. And I think it would be not hard (looking from the outside) to add a time of day function to the replicator so that it would only replicate images during a specific window. While I haven’t looked, I assume that the FOG Project has some kind of feature requests system to request this function.

While an image pull back function would be nice, I would assume a do not deploy feature could be added as a flag to the image on the slave server. The one way push of the image would be mandatory that way we would know all nodes in the FOG deployment cloud would be an exact copy of the master node.

Instead of copying the databases all over the place some kind of in process communication would be very light on the WAN links. I also thought about configuring mysql to replicate the FOG database around to all of the slave nodes, but I think that would make a mess of things since FOG wasn’t designed to use unique record number. Your mysql dump solution may be the only way to keep things in sync without breaking the whole environment.

Thanks for the link to Tom’s news. I’ll have to read up on this too.

Wayne Workman · Oct 26, 2015, 2:26 AM

@george1421 said:

I assume that the FOG Project has some kind of feature requests system to request this function.

There is a feature request area in the forums.

FOG storage node and data replication

149

12.1k

17.3k

155.4k