MYSQL/HTTPD resource issues in 5020
-
Just updated to svn 5020 and we are having a system resource overload. Mysql doesn’t seem to be dropping any connections now and apache has over 120+ children spawning when then computers check in with the client. From the time I started this topic to right now the sql connections have jumped from 300 to 1100…
mysql is running around 30-35% CPU and the httpd processes are chewing up the rest of the CPU.
This only started happening when we updated to 5020 from 4960 (but svn shows revision 4189?)
-
If you know FOG Version 4960 was a “good point” then maybe you could use SVN revisions: 4159 thru 4189 (4159 = 4960 – git commit svn 3510 + number of commits since that tag was released) and 4189 = 5020 ) (this means 30 svn revisions have been pushed.)
You could go thru them going back and forth cutting the last by half.
For example, if 4159 is known good, and 4189 is known bad, split the difference in the middle. This means test svn version 4174. If this is good, then we know 4159 thru 4174 is good. If 4174 is bad, we know the problem came in before this point.
In either case you split it by half again from the last good/bad state.
For example, if 4174 is bad, we go back 7.5 revisions (just make it an even 7). If 4174 is good go up 7.5 revisions (just make it an even 8.) it will be much faster to determine exactly when the problem occurred.
Hopefully this makes sense. Don’t think of things in the sense of the versions. They basically get incremented by 2 each commit because of how we do the syncing of svn to git. We have to get the updates and push them, then merge the svn to the dev branch. So the svn branch should match exactly with svn IRL.
If you have to do the splits, going backward use floor to get the value (round down to the nearest integer). Going forward, use ceiling to get the value (round up to the nearest integer).
This way, you are only having to go thru maybe 6 - 7 revisions to know where the bad revision was introduced, rather than going thru 30 revisions one by one hoping to find it.
-
Apologies.
The @Developers are working on it. FOG has recently moved to innodb, and with that comes many challenges and necessary tweaks.
SVN version 4103 is pretty solid. But keep in mind FOG does not support downgrading.
Please provide any relevant logs. How many hosts do you have in your environment?
-
Tom is suggesting you figure out where it went wrong… somewhere between 4960 and 5020 obviously.
You can checkout a specific revision like this:
svn -r xxxx up
or
svn -r 4174 up
Of course, install it afterwards:
./installfog.sh -y
It really helps a lot if FOG Trunk is virtualized… because again I’d stress fog does not support downgrading… but if Tom says it’s OK to do for this small range of SVN revisions, then it’s OK to do.
-
I understand what you are saying and i am happy to help try and figure out where it went south. My only question is the DB schema backtracking…
Do you know how far i can go back in revisions before i hit a change that will cause a problem? I know fog can’t downgrade so i want to make sure i don’t blow up the DB
Adam
-
While it is never a good idea to revert, none of the recent db changes are going to affect anything. If anything else, they will simply be ignored in revisions where they were not in use. Where you would run into problems is if you try to backtrack say all the way back to version 1.2.0 on your current database. This is because I removed necessary bits that 1.2 expects, but is not necessary in the current. Just don’t downgrade directly to one such revision and you will be fine.
-
@Adam-Taylor You can export the DB before each change.
FOG Configuration -> Configuration Save -> Export
That exports the entire DB. Everything. I notate my exports name with a revision number post-download.
So all my exports are named “filename_rxxxx_month_year”
-
I forgot to add.
We have 1892 hosts currently.
As for the logs…the only logs i am getting are:
[Tue Oct 20 16:09:34 2015] [error] [client 10.69.15.60] PHP Warning: mysqli::real_connect(): (08004/1040): Too many connections in /var/www/html/fog/lib/db/MySQL.class.php on line 29
This is when the client hits the server to see if it has any tasks.
If i set the max_connections in sql to something huge (2k plus), in about 45 minutes i will start to get that message. The connections just keep racking up and up…never go down…
I also tried setting “wait_timeout” and “interactive_timeout” to something like 10 seconds to try and make the connections go down…nada…connections keep rising.
I’m about to get off work, but first thing in the morning i am going to start bouncing around the svn versions to see if i can find which one might be causing this.
Thanks!
Adam
-
@Tom-Elliott For me too, cpu is running 100% due to apache2 and mysqld processes after svn 5020 upgrade.
In web interface I’ve got these values : Load Average 96.24, 101.58, 106.29
Not the first time I’ve got this issue after upgrading svn -
It’s seems ok with last svn (git 5040)
Load Average 1.16, 2.65, 2.48 -
@Matthieu-Jacquart et al, I believe this was fixed last night. Something didn’t seem right and while chatting what I thought the issue was it came to me. Thank you @Uncle-Frank and everybody for reporting and patience. Basically my suspicion of what was happening is the gui kept on make connections to the db. Anytime a new page was loaded or clients where checking in a new db connection was being created even if you were using persistent connections. I looked over the code and it certainly made sense based on the logic of the code. I should’ve thought of it sooner I know, but I’m glad it’s now working more appropriately.
-
Sorry, I haven’t gotten back lately…had a sick child.
I have tried 5040 and 5042 this morning and I am still seeing almost complete CPU usage to the point that the web gui will talk 20-30 seconds to respond or not respond at all or not render most of the page.
I will try to go back to older ones to see where the problem was introduced.
-
Actually I’d recommend doing a full server reboot. I think what you’re seeing now is it catching up from all the bad stuff before. Of course it may be the case that something else is weird. Can you watch your Apache error log? Often when I see delays of loading as such there is usually something at a fast and large pace writing to the log files.
-
Rebooted…1 hour later still seeing same thing. I’m thinking there is something going on with apache. While i write this there is 259 httpd processes chewing up everything.
In the apache log, i get:
[Thu Oct 22 16:54:22 2015] [error] server reached MaxClients setting, consider raising the MaxClients setting
I just don’t know what would be causing this all of a sudden… Was something changed in the apache v/host settings in the past few weeks with the installer?
Adam
-
And i forgot.
The message only happens every once and a while in the error_log file.
-
@Adam-Taylor No, nothing has been changed on the vhost generation. Maybe you have a lot more computers running and creating new connections?
-
@Adam-Taylor Do you have a lot of clients? Are you using FOG_HOST_LOOKUP? (On hosts do you see the green/red dots when on list/search?)
-
@Adam-Taylor Please install the packet
lsof
on your FOG server and runlsof -i :80
to see who keeps all those apache processes busy. -
Right now i have 33525 mysql connections when i run “show status like ‘Con%’;” in mysql cli. And no…that was not a mistype The connections are just not going away.
Total registered hosts are 1892 of which about 1800 have the fog client.
Running lsof, I am a around 20-30 established connections at any one given time with about the same listening. All while the CPU is around 100%. Every once and a while it might spike to around 180 or so but then drop back down within 30 seconds or so.
FOG_HOST_LOOKUP was checked and I just unchecked it to see if that helps. So far I am not seeing a difference after a total system reboot.
I had to restart the mysqld process so that I could get to that setting as the web gui would not do anything. In that 2 minutes from a reboot, I am back upto 297 mysql connections.
BTW, I am running this on RedHat Enterprise 6.
Thanks!
Adam
-
What does
lsof -i :3306
show? Is it all localhost connections?And what about
lsof -i :80
? All connections from different clients??