Automatically Sync Images & update DB across multiple servers
-
@brooksbrown how does a computer plugged into bench A get an ip address for bench A if they are on the same subnet? (I’m driving to a solution, I just want to make sure its inline with how you are currently using the system)
Edit: Not fair deleting your posts as I answer
-
@george1421 See attached network layout image. They are able to talk to each other.
-
@brooksbrown I’ll ask it a different way then, how are you keeping the PCs connected to bench 2 from getting IP addresses from bench 1?
BTW your diagram just cut out 10 or so questions. Good job.
-
@george1421 To add a little color to my question. PXE booting clients issue a dhcp broadcast packet. So the first dhcp server responding will give it, its IP address. If you want to do something like the design of your network you will need to have each bench on an isolated vlan so you can limit the scope of the dhcp broadcast.
-
@george1421 Each bench client machines are getting DHCP from the FOG server on that bench. Maybe I’ve gone about setting this up the wrong way, but the theory behind it is, if Node1 machine dies or whatever, we still have the ability to image machines on the other 2 benches.
And again, ultimately we will need to sync these images/DB from our corporate facility to other facilities across the country using the Location Plugin.
Currently, we use Ghost… I’m not even going to get in to that. We’ve been utilizing it for all 12+ years of my employment. But our image sizes are in the 60 GB file size, and over Ghost and our network as it is now, it takes 45+ minutes to over 2 hours sometimes to image machines.
FOG is MUCH MUCH MUCH faster! And I want to utilize it to replace Ghost.
-
@brooksbrown Right now I see a networking issue, not even related to FOG.
As I mentioned, you have a flat /21 network with 3 dhcp servers on it. Each (in I guess) the fog server. DHCP works on broadcast messages. So any dhcp server in the broadcast domain (/21 subnet) will answer the dhcp discover request from the client.
-
@george1421 DHCP will come from the least hop count. Because each Node is on the same switch that it’s hosts are on, it always gets the IP from that Node on the same switch.
-
@brooksbrown The reasoning behind this is if Node 1, 2, or 3 were to go down for whatever reason, then the machines on that bench would still be able to get an IP from one of the other machines and then PXE boot from that other machine at another bench. It’s a way to create failover on our benches just in the event that failure of a machine happened.
-
Here’s a large scale picture of what we’re going to end up doing in the long-run with Location Plugin.
As you can see the NJ-HQ is the largest imaging office we have. But there are 5 other facilities that image these same rental machines when they end up in their office after a rental or transfer request. So keeping the hosts section of the DB always in sync between them all, even if it is a every 24-hour sync at 10pm has to happen. Because the same machines will end up floating through the different office at times. In the location plugin we know that each Host will have to be edited for it’s location of imaging when it arrives in another office and is put on the bench to image so it doesn’t pull from the last office location it was in.
And yes, I know when we do this, we will need to have /20 subnet to handle it.
-
@brooksbrown I think the terms are being mixed up here so I am trying to set this right. When installing FOG there are two different modes you can choose from, either master OR storage node (there is nothing called “master storage node”!). From your description it sounds as if you installed all your FOG servers as master nodes and therefore you have separate DBs and hosts being registered to one node won’t show up in the other node unless you register each and every client with every FOG server you have. On the other hand if you do the usual setup you’d have ONE master node (where the DB and web UI is) and several storage nodes.
The “failover setup” you intent to build is not as easy to handle I am afraid. There is a lot involved in PXE booting clients - 1st DHCP handshake from the PXE ROM, TFTP to load iPXE binary, 2nd DHCP handshake from the iPXE binary, TFTP and then HTTP request to load iPXE config, HTTP request to load kernel and initrd, 3rd DHCP handshake from the linux kernel. It might sound as if this is pretty straight forward. But if the FOG server for a particular bench fails (or is too busy) the client will get his first DHCP answer from another FOG server. For example if the 1st DHCP handshake is answered by lets say NODE 2 (as the client is on bench 2). Then the client will download the iPXE binary from NODE 2 as well. But the 2nd DHCP handshake might be answered by NODE 3 (cause NODE 2 is not fast enough this time). Still fine if they all share the same DB (which would be on the one single master node) and the client gets a consistent iPXE config (e.g. FOG menu or task).
In theory this all works but all servers kind of need to be in sync, TFTP files, kernels/initrds, FOG web UI. If you alter the kernel on one server you might see clients from a different bench booting that kernel at random.
That all said there is another reason why I think this setup is not great. For every DHCP broadcast a client sends it gets up to eight answers from all the DHCP servers. Finding an issue and keeping this all setup properly will be a nightmare I suppose! What if there is just one single setting different on NODE 6? Some clients will boot properly but others will fail randomly because of that.
I reckon one could be keen enough to set this up all in one broadcasting domain using two or three servers at the most. But definitely not eight. This will cause you so much headache I suspect. Just don’t do it if you and the rest of your team are no real network wizards who love to use tcpdump for analysing network packet dumps to figure out what’s going on.
If you intend to use the fog-client the whole idea of failover is buried alive anyway. Sure, fog-clients not reaching their particular server is not as problematic as the other stuff can be. But failover is just not possible.
I bet you better take some more time to think about the network setup now and have a lot less issues later on…