GUI crashing every 30 minutes
-
During work hours, our FOG GUI is crashing about every 30 minutes or so. We are using it daily with a pretty heavy client load and maybe 3 to 6 machines imaging at a time. Imaging seems to still work when this happens. Running “service mysql start and stop” seems to temporarily fix it. The server is a Lenovo S30 so old but a workhorse. Master running Centos 3.10.0-1160.76.1.e17 Nodes are running on VM on desktops at each site. 20~ish sites.
What we have tried/checked:
We have updated the main server and all the nodes.
We found something about updating the kernels online. We did that and it seemed better yesterday.
(This item has been ongoing for a while, a nuisance that we keep hoping to work on as we have time but may be related.) I also suspect we might have some data corruption (I think we might have a duplicate mac address situation). I can’t look at all the hosts on the host page. We can only search. I was able to export from the Configurations page. I can’t export hosts from the hosts’ page though. The csv is hard to look through just because of the sheer volume of clients.
We checked memory usage, it seems good.
We have run the database clean-up commands. They come back pretty clean.
We have checked passwords on the server and nodes.
We have looked through logs and checked on any errors that seem like they could be the cause.Any suggestions or places we should start looking?
-
@lpetelik For a bit of clarity I have a few more questions.
- Is the sql server process crashing (no longer in memory) or is it just not responding to requests.
- What version of FOG are you running?
- It sounds like you have a fog master node and multiple storage nodes running on VMs? Is that accurate? How many storage nodes do you have in this configuration?
- How many target computers are in your environment that have the fog client installed?
- When inspecting syslog/messages/sql server logs do you see any error messages relating to sql server, like too many connections?
- What is the fog server host operating system?
Just for clarity in definition updating “the kernel” relates to the FOS Linux kernel [bzImage] that gets transferred to the target computer during imaging. This reference has nothing to do with the FOG server host OS kernel.
-
Checking on the other answers, might be Monday before I can reply.
2. 1.5.9
3 Master on regular hardware, nodes on VM. 24 nodes
4 I will get a more exact answer. Should be near 2500-3000I should also mention that the GUI dropping just started about a week ago.
-
@lpetelik said in GUI crashing every 30 minutes:
3 Master on regular hardware, nodes on VM. 24 nodes
4 I will get a more exact answer. Should be near 2500-3000I’m thinking mysql is complaining about max open connections and with 2500+ fog workstations you might be running into a myisam database performance issue. Both issues can be resolved. The myisam/innodb issue would be displayed by looking at top and then sorting by
P
rocessor. If mysql is having high cpu with many php-fpm workers then we need to switch over the database engine.I should also mention that the GUI dropping just started about a week ago.
Could be an update issue or you just got lucky. A broken/slowed down sql server would make the web ui unresponsive.
-
@george1421
Sorry for the late reply- I will need to check on this.
- 1.5.9 current stable on all nodes and master.
- Master is on regular hardware. One node is also. The rest of the nodes are on VM. 24 nodes
- A bit over 4000.
- Not “too many connections” errors. I am seeing a memory error and will try to address that today.
- A mix between Centos and Rocky. We are trying to move all of the nodes to Rocky.
-
@lpetelik Will you submit a screen shot of
top
sorted byP
rocesor utilization? This should be captured under normal load imaging or not, but with all or as many as possible clients contacting the FOG server. -
Here is the top screenshot. It does look like a lot of processes running using quite a bit of CPU.
-
@lpetelik I thought I had a tutorial on fixing this. The root of the issue is your fog database is still using myisam for the data engine and not the innodb engine.
I kind of outlined the steps here: https://forums.fogproject.org/topic/15856/web-gui-error-after-clicking-list-hosts
Let me write it up a bit clearer tonight, but all of the steps are in that thread. Just read the thread from the bottom up to make it understandable.
-
@lpetelik Well, blind I must be. Here is the tutorial I thought I created but could not find earlier today: https://forums.fogproject.org/topic/16099/configure-fog-database-to-use-innodb-engine
-
@george1421 We just ran the engine update. Thanks for those links. We’ll watch it for a bit to see how it looks. The GUI is a bit slow at the moment, hoping that clears up.
On a side note, while working with this issue, we ran .installfog and during the install it picked a new STORAGENODE MYSQLPASS password that had a closing bracket in it. It broke the node’s connections to the master even with the password updated in all the spots it should be. Once we created a new SQL password (on a whim to test, without special characters) and updated it in .fogsettings, reran the .installfog file, the nodes started talking to the master again.
I think we are in much better shape. We need to fix our node password issue and start imaging again to know for sure.