In reference to some previous posts I’ve made, this post is probably in relation to the size of the environment where my FOG install resides.
The exact errors as they are stated from this link
https://mariadb.com/kb/en/aborted-connections/
What I have seen is that after some time in production, between many simultaneous images being deployed from around 40 storage nodes, as well as thousands of clients connecting to the main FOGserver with db…
Pages and images start to fail with a ‘database connection failed!’ blank html page (with a url pointed to schema) or ‘valid database connection could not be made!’
The mariadb/MySQL logs start out fine at startup, followed by a few errors that say data inconsistent, then start to slowly get flooded with the aborted connections error until that error persists 5-10 times per second constantly until the service is restarted.
This has shown up occasionally in the past however was usually resolved by a reboot and wouldn’t come back after several weeks. This time however we’re seeing this 2-3x an hour.
A quick netstat-plant shows what looks to be a lot (if not all?) storage nodes connecting over 3306, and established - a LOT of time_wait to all sorts of hosts on 127.0.0.1:9000 and quite a few on 80 as well.
This issue causes all tasks to fail, from imaging to client check in.
Because I see the mariadb KB pointing to .net, I was wondering if this is related to the FOG client?
I currently have agent check in time at 8 minutes and have migrated to innodb.