FOG Server failover
I am running FOG 1.2.0 on CentOS 6.6, I also have a Storage Node on CentOS 6.6.
Is it posible to switch the Storage Node to Normal mode if the current FOG server crashes? Is the MySQL database replicated on the Storage Node? What is the best way to set up the system for failover if the main FOG server crashes?
I can see that images are replicated from my FOG server to the Storage Node and everything works well.
I still have alot of testing and configuration to do but as soon as my documentation is done I will post it here. I am having issues with NIC bonding right now, it works great on the FOG servers, CentOS 6.6, but I applied the same settings to the Red Hat 6.6 servers and the connection keeps dropping. The fun of IT!!
[quote=“Wayne Workman, post: 47066, member: 28155”]because you’re making documentation anyways, right?[/quote]
wait, i thought documentation was like commenting your code or backing up your files. something you expect other people to do, but you don’t do yourself.
Do share your FOG failover documentation, it’d make a great WiKi article!
I always recommend donating unique documentation. If you (or your organization) can’t donate hard cash, documentation is the next best way to contribute, because you’re making documentation anyways, right?
Thanks again for all the replies!!
It’s up now. I did a dump of the FOG database, I just needed to chage the IP address in FOG settings for everything to work.
In FOG Settings:
General Settings> FOG_WOL_HOST: <ip address of server>
TFTP Server> FOG_TFTP_HOST: <ip address of server>
Web Server> FOG_WEB_HOST: <ip address of server>
This is for a large project that we are building, rack 1: MasterAA, MasterAB, FOGServer; rack 2 MasterBA, MasterBB, FOGStorage. The Masters are running Red Hat 6.6, the FOG Servers are running CentOS. There are also 70 consoles (pc running Win7). I will have several images on FOG and also database backups daily. They do a switchover every 3 months, however, the FOG servers will not be switched at that time, but I needed to make sure it can be done. I may have to use DHCP on my FOG servers because this will be a closed network. All the servers and consoles are using bonded NIS’s on seperate networks. I need to prove to the project manager that FOG can handle the project with no problems. Fun Fun!!
you’ll also need to change the ip’s stored in the fog configuration page and the default.ipxe file
And to more finely tune a switch-over, you might want to just have your storage node configured as an actual FOG server, and then just disable dnsmasq… BUT, you still need the FOGImageReplicator running as if it were a storage node…
So, when you switch over, all you have to do is just start that, and then everything starts working…
But, you still need some way to keep the DB updated, too… That will be key.
Did you edit the dnsmasq configuration on the storage node? You can literally take a copy of your primary FOG server’s ltsp.conf file, and just use “Replace All…” from a text editor to replace 220.127.116.11 with 18.104.22.168 Also, using a “Replace All…” instead of doing it by hand will minimize your risk of typos and mistakes. You must also modify the answer file to have the correct IP addresses as well. If you’re using names inside your ltsp.conf file, I’d recommend changing them to IPs instead… An IP is more reliable than a name.
Did you make sure TFTP is running? Permissions are set right on the /tftpboot folder?
See the wiki article “Troubleshoot TFTP”
With your old FOG server being [B]OFF[/B], and assuming you’ve got the “fail over” configured right, [B]hosts booting to the network should have no knowledge of 22.214.171.124 at all.[/B] So, this is why I’m asking you to double check your dnsmasq config… you probably forgot a line or something…
You sure DHCP isn’t handing out Option 066 and 067 ?
I am using dnsmasq so that is not an issue.
I dove in head first and made the switch to see if it is possible.
I did a mysql dump of the database on the FOG server and saved it to the images folder, the images folder is replicated on the storage node so I have it in both places. I powered off the FOG server so that it would not answer DHCP requests using dnsmasq.
On the Storage Node I installed dnsmasq, I edited the .fogsettings script so that it would become a FOG server and I ran installfog.sh. I then restored the database from the original FOG server backup. I can access all the images and data. The only problem I am having is with pxe booting to the New FOG server. I have included an image of the screen, the original FOG server is 126.96.36.199 and the storage Node that is now a FOG server is 188.8.131.52. I am sure there is a simple change needed somewhere but I am not sure where yet.
VM’s would be nice but not possible for this project. I don’t need an automatic switchover so scripting will not be neccessary, I did think about writing a simple script to replace .fogsettings with .fogsettings.master/.fogsettings.storage, start or stop dnsmasq, and restore the database from a backup but a manual switchover will work.
Thanks for the quick replies, I will continue testing in the morning.
VM Replication is your friend.
I use Hyper-V for FOG.
Replication is turned on for my virtualized FOG server.
If the FOG server (or the underlying server) becomes unresponsive, an exact replica of it (hdd, RAM contents and all) fires up immediately (and automatically) on another server.
My understanding is the DB is not replicated, just the image groups you have configured.
There is a few threads on this, one kinda went off the deep end… But, one of them talks about a fog backup / export script. It’s a script that exports the DB and images… I suppose one could rig something so that this just runs regularly… like every hour or something… but you’d have to check to see if the images are already there or not because you don’t want to copy all your images over every hour… You’d then need to take the DB data and stick it into the other server’s DB. And you’d have to ENSURE your image permissions are right…
I suppose it’s possible to configure one server as a node, but have the other primary server’s installfog.sh answer file in the correct place (with IP properly configured)… and I’d suppose one could rig some system to detect if the primary becomes unreachable for something like 60 seconds… then execute the installfog.sh script automatically with answers in place… heck, maybe even edit the install script so that it doesn’t ask for confirmation? Then, after that install script gets ran… everything should be running, I think…
Given that you’re using dnsmasq, ofcourse… because I have no idea how you’d swap out Windows Server DHCP options 066 and 067 like that… I’m sure it could be done, but it just makes more sense to have two different ltsp.conf files (one on each fog server) configured for said server…
I’m just thinking out loud here, I have no idea if it’s easily doable or not. You’d [U]DEFINITELY[/U] need scripting skills… and maybe programming skills.