MariaDB crashing due to too many connections
-
In reference to some previous posts I’ve made, this post is probably in relation to the size of the environment where my FOG install resides.
The exact errors as they are stated from this link
https://mariadb.com/kb/en/aborted-connections/What I have seen is that after some time in production, between many simultaneous images being deployed from around 40 storage nodes, as well as thousands of clients connecting to the main FOGserver with db…
Pages and images start to fail with a ‘database connection failed!’ blank html page (with a url pointed to schema) or ‘valid database connection could not be made!’The mariadb/MySQL logs start out fine at startup, followed by a few errors that say data inconsistent, then start to slowly get flooded with the aborted connections error until that error persists 5-10 times per second constantly until the service is restarted.
This has shown up occasionally in the past however was usually resolved by a reboot and wouldn’t come back after several weeks. This time however we’re seeing this 2-3x an hour.
A quick netstat-plant shows what looks to be a lot (if not all?) storage nodes connecting over 3306, and established - a LOT of time_wait to all sorts of hosts on 127.0.0.1:9000 and quite a few on 80 as well.
This issue causes all tasks to fail, from imaging to client check in.
Because I see the mariadb KB pointing to .net, I was wondering if this is related to the FOG client?
I currently have agent check in time at 8 minutes and have migrated to innodb.
-
@p4cm4n When you loot at TOP are there many php-fpm processes and if you sort by
P
rocessor is mysql the top process with about 40% CPU utilization?How many computers with the fog client installed are hitting this FOG server?
-
@george1421
mariadb is between 30-50% roughly.
currently have given the box 32 cores, 32GB RAM. RAM util is only hovering around 1.25GB though.
so far, approximately 6000. around 7500 in the host DB. most of those are on during the business day. -
@p4cm4n OK let me look up the script I want you to run on your fog server.
-
@george1421 from the multicast not starting thread?
https://forums.fogproject.org/topic/15960/fog-multicast-not-starting-anymore -
@p4cm4n Well I guess I did write the tutorial: https://forums.fogproject.org/topic/16099/configure-fog-database-to-use-innodb-engine
The exact command I want you to run is this:
SELECT TABLE_NAME,ENGINE FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'fog'
Are the table engine is ISAM or Innodb? If they are isam then lets get them migrated to release some of the back pressure on the database.
-
@p4cm4n said in MariaDB crashing due to too many connections:
currently have given the box 32 cores, 32GB RAM. RAM util is only hovering around 1.25GB though.
Adding 96 cores won’t solve this problem. The isam data engine does row table level blocking on an update vs row level blocking using innodb. This creates a log backlog of connections and eventually connection timeouts.
-
@george1421 yeah man. used that tutorial (from you
already on innodb.
TBH, it seemed that when this issue happened earlier, it happened when we were using ISAM. after migration, they went away, until recently.
(we’re adding 200-250 hosts a day however) -
unsure of this being a permanent fix, or if this is specific to my environment -
further troubleshooting of this environment showed that a specific storage node was behind NAT (and as such, mysql_error was showing a connection failing from a gateway IP, versus the storage node IP)
after shutting this storage node off toward the end of business, the error seemingly cleared until the next day, and a bulk of imaging/client checkins began.
couldn’t tell if its because lots of client checkins happened at start of business as a lot of machines turned on, OR something else. the error returned around 10AM.
i found a link (https://www.thegeekdiary.com/mysql-error-too-many-connections-and-how-to-resolve-it/)
which pointed me in the direction of temporarily opening up max connections for SQL.
this has so far resolved the issue, even with the errant storage node. i was at 151-152 just prior to this bump, now i’m hovering around 160-165. i might have been hitting this ceiling organically. -
the below entry, opening max connections for sql - was indeed a fix.
default is 151, and i now have been a solid 200 connections for a week with no issues.in debian 11, i did the following :
sudo su -
mysql -D fog
SET GLOBAL max_connections = 512;To make this a permanent solution, refer to the link in the previous post.