High MySQL CPU Usage Bogging Down Server
-
@uwpviolator The very last line is interesting as it’s bascially saying the mac being passed is invalid (#!im = invalid mac) though this should be handled if it’s a new client getting this message, that’s not to say there isn’t a problem, just trying to understand why it’s saying that in the first place. But I highly doubt it’s that that’s causing the problem with high cpu as it’s the only message. The more interesting is the MaxRequestWorkers. Typically I see this when a bunch of clients are trying to checkin at the same time, and my guess is this is kind of the case. They’re stuck in an authentication loop, which hopefully resetting encryption data might help fix. (Initially it won’t help as all clients would still be trying to checkin to get their information, but once it stabilizes you should see a significant difference).
What’s interesting to me is the client checkin time.Our defaults are set to reauthenticate data every 30 minutes. And your check-in cycle time is set to 20 minutes, from what I can gather. Maybe it’s the time cycles are “too much in sync” with one another? I’m just speculating at this point. The time itself shouldn’t be a major problem, but if you have a bunch of hosts authenticating all at the same time I could see this being a potential bottleneck.
-
@tom-elliott said in High MySQL CPU Usage Bogging Down Server:
Our defaults are set to reset encryption data every 30 minutes
Since when was that a thing? First I’m hearing of it.
-
@wayne-workman This has always been the case. The reset is not the same as what you see done with reset encryption data on the GUI however. This is more a way to ensure a token shared is not potentially stolen and used maliciously. Updated prior post, the “reset encryption” is more properly phrased as reauthenticate every 30 minutes.
-
@tom-elliott @Wayne-Workman so far so good. Set check in time back down to 15min.
-
@UWPVIOLATOR Can we mark this solved?
-
@sebastian-roth I guess, we keep seeing it spike but we are going to switch from Ubuntu to Centros and then upgrade FOG to 1.5.0 in the next few weeks.
-
Ok so I have more info on this. Today around 12:05-12:10 our clients lost encryption and now they are crashing the server until they check in again.
See this log file. I have seen this same thing on multiple machines around the same time.
------------------------------------------------------------------------------ ----------------------------------UserTracker--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 11:47 AM Client-Info Client Version: 0.11.12 3/7/2018 11:47 AM Client-Info Client OS: Windows 3/7/2018 11:47 AM Client-Info Server Version: 1.4.4 3/7/2018 11:47 AM Middleware::Response Success ------------------------------------------------------------------------------ 3/7/2018 11:47 AM Service Sleeping for 1161 seconds 3/7/2018 12:06 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&configure&newService&json 3/7/2018 12:06 PM Middleware::Communication ERROR: Could not contact FOG server 3/7/2018 12:06 PM Middleware::Communication ERROR: Unable to connect to the remote server 3/7/2018 12:06 PM Middleware::Response Success 3/7/2018 12:06 PM Service ERROR: Invalid promptTime, using default 3/7/2018 12:06 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&mac=10:60:4B:85:D2:35&newService&json 3/7/2018 12:07 PM Middleware::Communication ERROR: Could not contact FOG server 3/7/2018 12:07 PM Middleware::Communication ERROR: Unable to connect to the remote server 3/7/2018 12:07 PM Middleware::Response Success 3/7/2018 12:07 PM Middleware::Communication URL: http://fogserver/fog/service/getversion.php?clientver&newService&json 3/7/2018 12:07 PM Service ERROR: Unable to get cycle data 3/7/2018 12:07 PM Service ERROR: Unable to connect to the remote server 3/7/2018 12:07 PM Middleware::Response Success ------------------------------------------------------------------------------ ---------------------------------ClientUpdater-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ----------------------------------TaskReboot---------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------HostnameChanger------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ---------------------------------SnapinClient--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------PrinterManager-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------PowerManagement------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ----------------------------------UserTracker--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. 3/7/2018 12:07 PM Service Sleeping for 60 seconds 3/7/2018 12:08 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&configure&newService&json 3/7/2018 12:08 PM Middleware::Communication ERROR: Could not contact FOG server 3/7/2018 12:08 PM Middleware::Communication ERROR: Unable to connect to the remote server 3/7/2018 12:08 PM Middleware::Response Success 3/7/2018 12:08 PM Service ERROR: Invalid promptTime, using default 3/7/2018 12:08 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&mac=10:60:4B:85:D2:35&newService&json 3/7/2018 12:09 PM Middleware::Communication ERROR: Could not contact FOG server 3/7/2018 12:09 PM Middleware::Communication ERROR: Unable to connect to the remote server 3/7/2018 12:09 PM Middleware::Response Success 3/7/2018 12:09 PM Middleware::Communication URL: http://fogserver/fog/service/getversion.php?clientver&newService&json 3/7/2018 12:09 PM Middleware::Communication URL: http://fogserver/fog/service/getversion.php?newService&json 3/7/2018 12:09 PM Service Creating user agent cache 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ---------------------------------ClientUpdater-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response Success ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ----------------------------------TaskReboot---------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------HostnameChanger------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ---------------------------------SnapinClient--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------PrinterManager-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------PowerManagement------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ----------------------------------UserTracker--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. 3/7/2018 12:09 PM Service Sleeping for 60 seconds 3/7/2018 12:10 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&configure&newService&json 3/7/2018 12:11 PM Middleware::Response Success 3/7/2018 12:11 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&mac=10:60:4B:85:D2:35&newService&json 3/7/2018 12:11 PM Middleware::Response Success 3/7/2018 12:11 PM Middleware::Communication URL: http://fogserver/fog/service/getversion.php?clientver&newService&json 3/7/2018 12:12 PM Middleware::Communication URL: http://fogserver/fog/service/getversion.php?newService&json 3/7/2018 12:12 PM Service Creating user agent cache 3/7/2018 12:12 PM Middleware::Response Module is disabled globally on the FOG server 3/7/2018 12:12 PM Middleware::Response Module is disabled on the host 3/7/2018 12:12 PM Middleware::Response Module is disabled globally on the FOG server ------------------------------------------------------------------------------ ---------------------------------ClientUpdater-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Success ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ----------------------------------TaskReboot---------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Module is disabled globally on the FOG server ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ --------------------------------HostnameChanger------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Success 3/7/2018 12:12 PM HostnameChanger Checking Hostname 3/7/2018 12:12 PM HostnameChanger Hostname is correct 3/7/2018 12:12 PM HostnameChanger Attempting to join domain 3/7/2018 12:12 PM HostnameChanger Host already joined to target domain ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ---------------------------------SnapinClient--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response No snapins ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ --------------------------------PrinterManager-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Module is disabled on the host ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ --------------------------------PowerManagement------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Success 3/7/2018 12:12 PM PowerManagement Calculating tasks to unschedule 3/7/2018 12:12 PM PowerManagement Calculating tasks to schedule ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ----------------------------------UserTracker--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Success ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Service Sleeping for 1156 seconds
-
Thread note: I will be in contact with @UWPVIOLATOR to see what works best when scaling for multiple clients, and/or if there are optimizations I could make to the client itself.
I am also building my own client stress-test environment so we can better replicate these issues.
-
I was able to install PHP-FPM successfully and made the switchover to using it instead of apache’s libapache2-mod-php implementation (mod_prefork).
Load wise, the server is doing MUCH better. GUI is still slow though.
-
@tom-elliott Did you also switch over to memcached? or just phpfpm? I’m more curios for performance improvements than the OPs issue. (Oh did I say that on the outside…)
-
@george1421 Just php-fpm.
I don’t know how much help memcache would give, though I suppose it wouldn’t hurt to try it.
-
@tom-elliott I need to look at the document to make sure its not capable of the zombie attach that impacted github, but here is what I setup before: https://forums.fogproject.org/topic/10717/can-php-fpm-make-fog-web-gui-fast/3
[edit] nevermind I already put in the bind to 127.0.0.1 in the tut so we are protected already [/edit]
-
I will keep an eye on the server tomorrow let you guys know how goes.
-
Update
Today was much better. The load stayed way down nothing like we have seen before so the fix is working in that aspect, but CPU utilization is still running close to 100% but never crashed like it did before. Because of the CPU usage the GUI was slow to respond. Did some talking today and we might just blow up all the images (we have 600 of them too, but not all of them don’t have image files on the server) and the database and start brand new when we move to the new server build. So I am going to see if we can quickly identify a good chunk of hosts we can remove in the meantime.
@Tom-Elliott Our Sysadmin used PHPMyadmin to access the Mysql server and that was not working for him today. What do we need to do to get that to work with your fix from yesterday?
-
@uwpviolator we may have to manually install phpmyadmin. The one Ubuntu uses relies on libapache-mod-php so moving to php-fpm probably broke it from communicating. I can work with you tomorrow around 3pm if that works?
-
@tom-elliott CST or EST?
-
@uwpviolator est
-
@uwpviolator Is this a virtual machine? If so what are the configuration (vCPU and memory)? What constructs the disk subsystem?
Now that you are running php-fpm, what is the top CPU hog? Is it mysql?
(my intuition is telling me this) I wonder if mysql needs to be optimized here or there is an underlying performance issue (like disk subsystem) that is being taxed causing the high CPU load. I did some benchmarking a while ago comparing different setups. I’m not saying that any of this is relevant to the case at hand, just trying to connect more data points Ref: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast
-
@tom-elliott Yeah that works.
-
@Tom-Elliott @Joe-Schmitt @george1421
FOG is tanking. Here is the apache log. Any ideas?
[Fri Mar 09 10:43:21.688093 2018] [proxy_fcgi:error] [pid 974] [client 10.75.1.5:61438] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.688999 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.689017 2018] [proxy_fcgi:error] [pid 974] [client 10.129.153.197:56845] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.696951 2018] [proxy:error] [pid 973] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.696965 2018] [proxy_fcgi:error] [pid 973] [client 10.75.1.21:50126] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.698089 2018] [proxy:error] [pid 973] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.698102 2018] [proxy_fcgi:error] [pid 973] [client 10.77.150.58:50923] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.702536 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.702549 2018] [proxy_fcgi:error] [pid 974] [client 10.76.1.216:58741] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.703391 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.703404 2018] [proxy_fcgi:error] [pid 974] [client 10.86.150.85:49674] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.704294 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.704307 2018] [proxy_fcgi:error] [pid 974] [client 10.94.151.240:61237] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.708168 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.708181 2018] [proxy_fcgi:error] [pid 974] [client 10.120.153.201:53910] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.722465 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.722485 2018] [proxy_fcgi:error] [pid 974] [client 10.84.150.64:53527] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.726900 2018] [proxy:error] [pid 976] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.726916 2018] [proxy_fcgi:error] [pid 976] [client 10.119.151.67:63447] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:29.286285 2018] [mpm_prefork:error] [pid 971] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting