Load Balancing and FOG
I wanted to make a post about my theoretical foray into possibly setting up load balancing for both HTTP and DB connections for my fog setup due to the number of clients (and to play around too!). This is all inspired by an old post by @Wayne-Workman who I hope can help me if he has time. I am in the research phase right now but basically here are my thoughts:
A cluster of 3 fog servers tied together by HAProxy for both Http connections and a Galera MariaDB cluster.
I am reading up on the viability of this solution. Anyone else have any thoughts?
Wayne Workman last edited by Wayne Workman
My next article will be on load-balancing FOG’s web frontend and database backend… I’ve just not wrote it yet… Don’t wait on me, it might be a while.
@george1421 I have not had time to do this because of some medical issues keeping me from work. Hopefully it is all sorted out soon and I can test this!
@fry_p ok now I think is time for some hokus-pokus magic without breaking much.
From your production server export the image definitions and import them into your dev environment.
From your production server export and then import the host definitions into the test setup. This will create a clone of your production environment without the image files.
Temporarily move the production FOG server’s IP address (i.e. just swing it out of the way) and move the dev FOG client server’s IP address in place of the production fog server (understand this may impact pxe booting if you have them boot through FOG). The goal here is to apply real world loads to the FOG client server. We should watch the CPU usage on the FOG client server (we may need to increase the number of php-fpm workers to satisfy all of the requests) as well as the mariadb service on the dev imaging server. We are looking for high cpu usages on both systems. This might tell us if we need to spin off the sql server onto its own server. We should also gauge the web ui performance on the imaging server, because that was a pain point when you had the check in times set to the default.
I’m not sure I’d leave the check in time at 30 seconds in a production environment, but it would be interesting to see the load.
When you are done testing then swing the dev FOG client server out of the way and put the production fog server back in place.
@george1421 Here are my results:
Spun up one FOG Image server and FOG client server, both connect to db on Image server.
Uploaded/deployed image no problem to two laptops.
Installed client and pinned client server as the address in the installation on both laptops (changed checkin time to 30s for testing purposes).
Both clients authenticated successfully. Deployed simple MSI snapin from client server to both laptops. Clients checked in no problem and pulled the files from the imaging server, and ran them without trouble.
I am starting to think this proof of concept is a success. Is there anything else I should try with this test environment I have set up?
@george1421 I have had bits of time here and there to do pieces of this. I will make a big reply when all of what you suggested is tested, but so far so good. Two servers talking to the same DB. Just need to get clients on the two test laptops talking to the client server. Just ran out of time yesterday. Will be doing this in the AM today.
@george1421 I had time to spin up two new fog servers in my test area, one for imaging and one for client interaction. The client interaction one is set to connect to the DB of the imaging one. I also imported the certs from my production machine to both.
Unfortunately I ran out of time before I could actually test anything. Holding pattern until Monday I suppose.
In your opinion, is 1500 hosts enough to justify this?
I don’t enough experience with a large campus (> 300 hosts) to give you an opinion on. As an experiment I might spin up 2 full fog servers. Configure the second fog server to use the first fog server’s database. The first fog server would be the imaging server and the second fog server would be for client interaction. You would want to copy the fog certificates from your production instance to both of these test servers. Register 2 existing clients in this new test sandbox. Then change these two target computers so they start chatting with the 2nd sanbox fog server. Does it work? Can you imaging from fog server 1 and have the clients talk on fog server#2. What you would be really doing is using apache and php-fpm on the second fog server to interact with the fog client that would then talk to the SQL database on the sandbox 1 fog server.
Don’t worry about any load balancing at the moment. The first question would be does it work? If yes, then does snapins deploy as you need them. They of coarse would come from sandbox fog 1, because that is your imaging server. If this whole thing proves out then we can discuss how to slide it into production without much pain.
Things you would want to watch in production is the SQL server load on fog01 that may give you an idea if you need to move that function to a dedicated server or not. And then apache/php-fpm load on fog02 to decide if you need to add a load balancer and a second client management server.
Again, these are only things off the top of my head without thinking too hard about the problem.
All the pieces are free. You can use NGINX or apache2 as a load balancer (there are other free purpose specific options as well), the fog front end is obviously already free, as is MySQL / mariaDB as well as memcached. JAMF isn’t free I’m just using their configuration as an example of how this would be done. They operate in much the same way FOG does with clients checking into a frontend that’s backed by a DB.
@george1421 I couldn’t resist… I remoted in and took a quick look. It seems to be roughly 1500 hosts. I am starting to realize that this situation may be more about me wanting to try this than actual necessity.
That being said, in the past, I did have to extend time between client checkins because of it absolutely consuming my fog server’s resources.
In your opinion, is 1500 hosts enough to justify this? I still want to do this regardless as I have caught the improvement bug already, but I would like to hear what you think.
@george1421 I will look tomorrow as I am off today. I know it is in the 1000’s. I like your layout there though. This seems quite doable.
@fry_p How many fog clients (computers with the fog clients installed) do you have?
I could see having 1 fog imaging server with admin web ui and 2 or more with load balancing in front for fog client management. That might work. With all 3 fog servers using a single standalone (or on the imaging server) database.
@george1421 I definitely did mean one database. I don’t quite know why i said cluster! I am in the use-case creation stage so this may not even take off, but I will wait for @Wayne-Workman to chime in since it seems at one time he has done this. The post I was referring to was his post in this: https://forums.fogproject.org/topic/12993/use-an-external-database
I am really willing to experiment here if it gets off the ground.
I realize Wayne’s visits are fewer now that he is in a different setting and I am willing to be super patient.
EDIT: @astrugatch I will explore this too but is that a free solution?
Might be a good idea to look at how JAMF does this. Its a similar setup with a HTTP loadbalancer -> multiple front ends (Tomcat in their case) -> a single SQL DB + a memcache server
Galera MariaDB cluster
Just talking off the top of my head;
I wonder if this is really necessary. Its adds quite a bit of overhead to setup, instead you would just created a standalone sql server and have all three FOG servers hit the same database.
The issue I can see might be the fog server caching of database entries. Its possible that one FOG server update the database files, and the others wouldn’t see the change and then step on the update made by another fog server. If the fog server hit the database every time then it should be quite as difficult.
The HA proxy sitting in front of your FOG cluster needs to maintain communication affinity. Consider if imaging is started on FOG01 and then during the imaging process FOG01 is busy so the communications is shunted to FOG03, but FOG03 doesn’t know imaging was already started.
What would be handy though, is if the FOG Clients would speak on one tcp port number vs imaging on a second tcp port number. The communications would be all http over tcp, but you could then direct client communication to FOG02 and FOG03 with imaging going to FOG01 always.