Node status - Online/Offline and better offline handling
It’s summer time and I work for a school district. Construction is common this time of year, so are “planned” power outages that we are never told about. Technicians have to unplug equipment/small servers in a hurry for custodians to wax floors, Network team does stuff with the network. The list goes on and on. So, our ever-growing fog system at work has suffered from offline nodes this week. As the system gets bigger (ultimately 14 or 15 server system), this problem will only happen more often.
When a node goes down (for whatever reason), the bandwidth chart simply disappears. On the version we’re currently on, the FOG Configuration area completely stops working. Also, When one goes down, I have to fight to get into Storage Management to get a listing of IP addresses, Ping each one, Figure out which one is down, and disable that node and disable it’s bandwidth chart.
What I’m asking for is an Online/Offline status on the dashboard for all defined storage nodes in said fog system. I’d like to have the same “Green dots” that are used to show if a host is online or not, but that be right on the dashboard and dynamically generated from the defined nodes in Storage Management. But one difference - when they are off-line, make the dots
I’m also asking for better handling of an off line node that isn’t disabled/chart disabled. This way, there’s less impact on the web interface, and hopefully won’t interfere with ongoing fog-related things at work.
Wayne Workman last edited by Wayne Workman
Since @Tom-Elliott doesn’t think it’d be a good thing to do additional polling (which I can understand), I’ve put together something which I feel is significantly lighter on the servers.
I’ve written a BASH script that will query the FOG DB for all storage nodes, and then send out a single lonesome regular-old ping to each one.
If the ping succeeds within the specified WaitTime, the node is enabled in the DB, and it’s bandwidth chart is enabled.
If the ping fails to return within the specified WaitTime, the node is disabled in the DB, and it’s bandwidth chart is disabled.
You can clone the project from here, it includes more instructions.
I’ll be implementing this at work on Tuesday with our 15-server FOG System, which is currently being plagued by power outages due to construction, maintenance, storms, and server moves.
Additionally, the script writes a simple html document that you can view by going to x.x.x.x/nodestatus.html
Here is sample outputs of what the html document looks like:
@MRCUR I am unfortunately not on the network team at my work.
While I agree having an up/down status light could prove “useful” I feel there’s already a TON of polling of servers.
Maybe something of this sort could be implemented in FOG-TOO where we plan to have socketed connections between things which ultimately means more real time stat possibilities AND less agonizing on the server(s).
This is only somewhat related, but why in the world are you not monitoring the Fog servers with whatever network monitoring software is in place?
More updates on this topic. We had a lot of storms here over the weekend.
This morning, I came into another instance of the bandwidth chart not working at all and storage not reporting in the dashboard.
I had to open putty, and ping each single storage node.
3 of them were down due to the storms.
After disabling those for graph and for “enabled”, all started working again.
I’d really really like some green dots to show next to storage node names in Storage Management for online/offline status. Pretty please :-D
Bumping this, how about green “on-line” dots in the storage management listing of storage nodes?