FOG storage node and data replication
-
After about 12 hours of running and the FOG Replicator service is still running at 100% utilization. It appears to be working as it should by moving files from the master node to the storage node. So it IS working, just with high CPU usage. I tried to poke around in the code a bit and add 20 second sleep statements to see if I could hit on where its looping uncontrolled (just a guess). I’m suspecting its somewhere after lftp is being launched and then it enters a task wait function which should wait until the lftp file copy is done. But from there I lost the trace (and btw I’m not a programmer only a good guesser).
I think I’ll need to leave this to the developers to take a peek at.
-
@george1421 Tom made some changes to replication. update and try again.
-
I was able to update the system to SVN 4070 this AM.
The FOG Replicators service is behaving much better now. The CPU utilization is better at/about the same utilization as the lftp process, so well done Tom.
The image files are still syncing between the servers. One thing I did notice about the FOG replicator service is if you stop and restart the replicator several times, multiple lftp services are running. Based on this I assume that when the replicator is stopped, it doesn’t kill off the running lftp process. Not an issue under normal conditions, just an observation.
-
@george1421 said:
One thing I did notice about the FOG replicator service is if you stop and restart the replicator several times, multiple lftp services are running. Based on this I assume that when the replicator is stopped, it doesn’t kill off the running lftp process. Not an issue under normal conditions, just an observation.
I suppose if you were to manually stop and start the FOGImageReplicator, you’d also MANUALLY stop the lftp instances as well.
-
@Wayne-Workman I don’t know of a good way to handle the processes. If I could I would prefer to use the similar methods as what multicast task does. However, multicast task does not close the processes, I don’t think.
-
While I can’t comment on the FOG code, at lot of systems will launch a process and then keep track of that process via a handle until it stops. In the destructor for the instances they will kill off the task based on the handle that was created when the process was launched of the application instance dies before the launched processes. I think the intent of the replicator was to have only one instance of the lftp process running at one time so it wouldn’t be too difficult to keep track of the process handle (as apposed to several hundred processes).
With the current design you normally wouldn’t have to start and stop the replicator multiple times, so having multiple instances of the lftp process running should never happen. I’m not seeing the value in putting energy into fixing a one off issue.
-
While this thread has run on a bit I think a lot of great info has been covered.
I’m thinking I need to rebuild my whole POC setup because I think the function of the storage node is just for storage (just a guess at this time) and the master node is my current production instance (blindly upgrading svn version has disabled my production instance in the past). But that’s part of running on the bleeding edge.
What I need for this project is two (or more depending on scope creep) functioning FOG instances managed from a single master node. These may be connected across a VPN link or a MPLS link. At this time I’m not sure if FOG is the right tool. I’m not saying FOG is bad, just I’m trying to do something with it that it wasn’t really designed to do. With the idea of linking the master and secondary nodes via the database connection might prove to be problematic over a WAN connection with latency issues. Again this is just me thinking about things I’ve seen in the past on other projects. I do have the cpu bandwidth to spin up two new fog instances to setup the POC in a controlled manner without breaking our production system (which works really well).
-
@george1421 I don’t think you’ve yet to explain what you’re trying to do really… If you did that, we could probably give you several configuration options.
The VPN over WAN issues is easily solved by the location plugin, for instance… You’re just poking here and there trying to make things work but we don’t know what you’re trying to do really.
-
Sorry, I got derailed because of the replication issues and didn’t fully describe me issue.
Here is a made up scenario (names changed to protect the innocent) based on a project that I’m working on.
Lets say we have 2 sites in New York [NY] and Georgia [GA]. The NY site is the HQ for the company. The windows image design team is at NY as well as the package developers. The NY site has what I will call the master FOG server. The GA site only has basic IT support staff. All images and snapins will be created by the NY IT teams. With plans on expanding in the near future to Kansas [KS] and Los Angles [LA] they want to setup a central deployment console (one way of doing things) that the remote sites can use to deploy images at their local site. Each site is connected to NY over a vpn link on the internet. While they have sufficient bandwidth on these VPN links they don’t want to deploy images across the VPN link.
Now as I see it I could do the following.
-
Setup a fully independent FOG server at each site. This server would be used to deploy images. Setup something like rsync to copy the master images and plugins from NY to each site. This would be the easiest to setup, but management would be a bit more because we would have to manually create settings in the remote systems as new images and snapins were created and replicated.
-
Setup a full FOG server at each site, but link the remote servers to the master server in NY using something like a master - slave setup. Since each site would have a full FOG server (sans database) they would have the tftp and pxe boot services there (something I feel is missing on a storage node). They (remote site admins) could use the master node to deploy images via their local FOG server, or some corporate weenie could deploy a new image to all lab computers in GA with a push of a button. I like this concept since all systems from all sites would be recorded in this single master node. We could write reports against this single node for documentation purposes. There are a number of issues I can see with this setup from database latency to the master node attempting to ping all devices to check to see if they are up, from the fog clients trying to mistakenly contact the master FOG node instead of their local FOG deployment server.
As I said, I don’t know of FOG is capable to do this since what I want was never in its core design. On the surface it appears to have the right bits in place to make this setup work. Looking back on my original post, I should have created a new one because the full picture is a bit broader than just a storage node syncing data with the master node.
-
-
Hi,
I have a similar scenario, with images being built in one place, then transferred out to other independent FOG servers (NOT secondary sites). I do it rather manually, using rsync and mysqldump on images (something like mysqldump fog images --where ‘imageId IN (xx,xx,xx)’ with the corresponding images to pull down).
I’ve thought about integrating them into fog, but sometimes, we need to pull back images from other servers, and not all sites have the same kind of connectivity, so the transfers have to happen at different times of day… So well, I use my brain, and we do sync using rsync/mysqldump as required. I will probably end up with some file synchronization system… but rsync works nicely for this. And the “one way image pull” is easy to script.
One way (meaning you don’t care what’s on the slave, as far as images go):
on master: mysqldump fog images > /images/images.sql
on “slave”: rsync -avP fog@master:/images/ /images && mysql fog < /images/images.sqlI think it’s not too appropriate to use the storage node feature for this, unless we can indeed define a bandwidth limit and time for the sync to happen, which can be done using a script & crontab without touching FOG. And if you want independent servers (I do), that won’t do, you need a full master on each site.
EDIT: ah, I just saw Tom’s news https://news.fogproject.org/imagesnapin-replication/
Well. That could do it… I guess we need to be able to flag a “big master”, or maybe have the ability to set an image as shared wherever it comes from on “masters”.
Cheers,
Gilles -
Very nice feedback. I’m glad I’m not the only one trying to do this setup. From the testing that I’ve done I can say the fog replicator does work and it copied all files except what were included in the lftp command Tom posted. My (personal) preference would be to use the built in tools if available. I can say the bandwidth restrictions do work as defined. And I think it would be not hard (looking from the outside) to add a time of day function to the replicator so that it would only replicate images during a specific window. While I haven’t looked, I assume that the FOG Project has some kind of feature requests system to request this function.
While an image pull back function would be nice, I would assume a do not deploy feature could be added as a flag to the image on the slave server. The one way push of the image would be mandatory that way we would know all nodes in the FOG deployment cloud would be an exact copy of the master node.
Instead of copying the databases all over the place some kind of in process communication would be very light on the WAN links. I also thought about configuring mysql to replicate the FOG database around to all of the slave nodes, but I think that would make a mess of things since FOG wasn’t designed to use unique record number. Your mysql dump solution may be the only way to keep things in sync without breaking the whole environment.
Thanks for the link to Tom’s news. I’ll have to read up on this too.
-
@george1421 said:
I assume that the FOG Project has some kind of feature requests system to request this function.
There is a feature request area in the forums.