High MySQL CPU Usage Bogging Down Server
-
@tom-elliott @Wayne-Workman so far so good. Set check in time back down to 15min.
-
@UWPVIOLATOR Can we mark this solved?
-
@sebastian-roth I guess, we keep seeing it spike but we are going to switch from Ubuntu to Centros and then upgrade FOG to 1.5.0 in the next few weeks.
-
Ok so I have more info on this. Today around 12:05-12:10 our clients lost encryption and now they are crashing the server until they check in again.
See this log file. I have seen this same thing on multiple machines around the same time.
------------------------------------------------------------------------------ ----------------------------------UserTracker--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 11:47 AM Client-Info Client Version: 0.11.12 3/7/2018 11:47 AM Client-Info Client OS: Windows 3/7/2018 11:47 AM Client-Info Server Version: 1.4.4 3/7/2018 11:47 AM Middleware::Response Success ------------------------------------------------------------------------------ 3/7/2018 11:47 AM Service Sleeping for 1161 seconds 3/7/2018 12:06 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&configure&newService&json 3/7/2018 12:06 PM Middleware::Communication ERROR: Could not contact FOG server 3/7/2018 12:06 PM Middleware::Communication ERROR: Unable to connect to the remote server 3/7/2018 12:06 PM Middleware::Response Success 3/7/2018 12:06 PM Service ERROR: Invalid promptTime, using default 3/7/2018 12:06 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&mac=10:60:4B:85:D2:35&newService&json 3/7/2018 12:07 PM Middleware::Communication ERROR: Could not contact FOG server 3/7/2018 12:07 PM Middleware::Communication ERROR: Unable to connect to the remote server 3/7/2018 12:07 PM Middleware::Response Success 3/7/2018 12:07 PM Middleware::Communication URL: http://fogserver/fog/service/getversion.php?clientver&newService&json 3/7/2018 12:07 PM Service ERROR: Unable to get cycle data 3/7/2018 12:07 PM Service ERROR: Unable to connect to the remote server 3/7/2018 12:07 PM Middleware::Response Success ------------------------------------------------------------------------------ ---------------------------------ClientUpdater-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ----------------------------------TaskReboot---------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------HostnameChanger------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ---------------------------------SnapinClient--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------PrinterManager-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------PowerManagement------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ----------------------------------UserTracker--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:07 PM Client-Info Client Version: 0.11.12 3/7/2018 12:07 PM Client-Info Client OS: Windows 3/7/2018 12:07 PM Client-Info Server Version: 1.4.4 3/7/2018 12:07 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:07 PM Middleware::Response ERROR: Object reference not set to an instance of an object. 3/7/2018 12:07 PM Service Sleeping for 60 seconds 3/7/2018 12:08 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&configure&newService&json 3/7/2018 12:08 PM Middleware::Communication ERROR: Could not contact FOG server 3/7/2018 12:08 PM Middleware::Communication ERROR: Unable to connect to the remote server 3/7/2018 12:08 PM Middleware::Response Success 3/7/2018 12:08 PM Service ERROR: Invalid promptTime, using default 3/7/2018 12:08 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&mac=10:60:4B:85:D2:35&newService&json 3/7/2018 12:09 PM Middleware::Communication ERROR: Could not contact FOG server 3/7/2018 12:09 PM Middleware::Communication ERROR: Unable to connect to the remote server 3/7/2018 12:09 PM Middleware::Response Success 3/7/2018 12:09 PM Middleware::Communication URL: http://fogserver/fog/service/getversion.php?clientver&newService&json 3/7/2018 12:09 PM Middleware::Communication URL: http://fogserver/fog/service/getversion.php?newService&json 3/7/2018 12:09 PM Service Creating user agent cache 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ---------------------------------ClientUpdater-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response Success ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ----------------------------------TaskReboot---------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------HostnameChanger------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ---------------------------------SnapinClient--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------PrinterManager-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ --------------------------------PowerManagement------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. ------------------------------------------------------------------------------ ----------------------------------UserTracker--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:09 PM Client-Info Client Version: 0.11.12 3/7/2018 12:09 PM Client-Info Client OS: Windows 3/7/2018 12:09 PM Client-Info Server Version: 1.4.4 3/7/2018 12:09 PM Middleware::Response ERROR: Unable to get subsection 3/7/2018 12:09 PM Middleware::Response ERROR: Object reference not set to an instance of an object. 3/7/2018 12:09 PM Service Sleeping for 60 seconds 3/7/2018 12:10 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&configure&newService&json 3/7/2018 12:11 PM Middleware::Response Success 3/7/2018 12:11 PM Middleware::Communication URL: http://fogserver/fog/management/index.php?sub=requestClientInfo&mac=10:60:4B:85:D2:35&newService&json 3/7/2018 12:11 PM Middleware::Response Success 3/7/2018 12:11 PM Middleware::Communication URL: http://fogserver/fog/service/getversion.php?clientver&newService&json 3/7/2018 12:12 PM Middleware::Communication URL: http://fogserver/fog/service/getversion.php?newService&json 3/7/2018 12:12 PM Service Creating user agent cache 3/7/2018 12:12 PM Middleware::Response Module is disabled globally on the FOG server 3/7/2018 12:12 PM Middleware::Response Module is disabled on the host 3/7/2018 12:12 PM Middleware::Response Module is disabled globally on the FOG server ------------------------------------------------------------------------------ ---------------------------------ClientUpdater-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Success ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ----------------------------------TaskReboot---------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Module is disabled globally on the FOG server ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ --------------------------------HostnameChanger------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Success 3/7/2018 12:12 PM HostnameChanger Checking Hostname 3/7/2018 12:12 PM HostnameChanger Hostname is correct 3/7/2018 12:12 PM HostnameChanger Attempting to join domain 3/7/2018 12:12 PM HostnameChanger Host already joined to target domain ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ---------------------------------SnapinClient--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response No snapins ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ --------------------------------PrinterManager-------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Module is disabled on the host ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ --------------------------------PowerManagement------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Success 3/7/2018 12:12 PM PowerManagement Calculating tasks to unschedule 3/7/2018 12:12 PM PowerManagement Calculating tasks to schedule ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ----------------------------------UserTracker--------------------------------- ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Client-Info Client Version: 0.11.12 3/7/2018 12:12 PM Client-Info Client OS: Windows 3/7/2018 12:12 PM Client-Info Server Version: 1.4.4 3/7/2018 12:12 PM Middleware::Response Success ------------------------------------------------------------------------------ 3/7/2018 12:12 PM Service Sleeping for 1156 seconds
-
Thread note: I will be in contact with @UWPVIOLATOR to see what works best when scaling for multiple clients, and/or if there are optimizations I could make to the client itself.
I am also building my own client stress-test environment so we can better replicate these issues.
-
I was able to install PHP-FPM successfully and made the switchover to using it instead of apache’s libapache2-mod-php implementation (mod_prefork).
Load wise, the server is doing MUCH better. GUI is still slow though.
-
@tom-elliott Did you also switch over to memcached? or just phpfpm? I’m more curios for performance improvements than the OPs issue. (Oh did I say that on the outside…)
-
@george1421 Just php-fpm.
I don’t know how much help memcache would give, though I suppose it wouldn’t hurt to try it.
-
@tom-elliott I need to look at the document to make sure its not capable of the zombie attach that impacted github, but here is what I setup before: https://forums.fogproject.org/topic/10717/can-php-fpm-make-fog-web-gui-fast/3
[edit] nevermind I already put in the bind to 127.0.0.1 in the tut so we are protected already [/edit]
-
I will keep an eye on the server tomorrow let you guys know how goes.
-
Update
Today was much better. The load stayed way down nothing like we have seen before so the fix is working in that aspect, but CPU utilization is still running close to 100% but never crashed like it did before. Because of the CPU usage the GUI was slow to respond. Did some talking today and we might just blow up all the images (we have 600 of them too, but not all of them don’t have image files on the server) and the database and start brand new when we move to the new server build. So I am going to see if we can quickly identify a good chunk of hosts we can remove in the meantime.
@Tom-Elliott Our Sysadmin used PHPMyadmin to access the Mysql server and that was not working for him today. What do we need to do to get that to work with your fix from yesterday?
-
@uwpviolator we may have to manually install phpmyadmin. The one Ubuntu uses relies on libapache-mod-php so moving to php-fpm probably broke it from communicating. I can work with you tomorrow around 3pm if that works?
-
@tom-elliott CST or EST?
-
@uwpviolator est
-
@uwpviolator Is this a virtual machine? If so what are the configuration (vCPU and memory)? What constructs the disk subsystem?
Now that you are running php-fpm, what is the top CPU hog? Is it mysql?
(my intuition is telling me this) I wonder if mysql needs to be optimized here or there is an underlying performance issue (like disk subsystem) that is being taxed causing the high CPU load. I did some benchmarking a while ago comparing different setups. I’m not saying that any of this is relevant to the case at hand, just trying to connect more data points Ref: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast
-
@tom-elliott Yeah that works.
-
@Tom-Elliott @Joe-Schmitt @george1421
FOG is tanking. Here is the apache log. Any ideas?
[Fri Mar 09 10:43:21.688093 2018] [proxy_fcgi:error] [pid 974] [client 10.75.1.5:61438] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.688999 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.689017 2018] [proxy_fcgi:error] [pid 974] [client 10.129.153.197:56845] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.696951 2018] [proxy:error] [pid 973] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.696965 2018] [proxy_fcgi:error] [pid 973] [client 10.75.1.21:50126] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.698089 2018] [proxy:error] [pid 973] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.698102 2018] [proxy_fcgi:error] [pid 973] [client 10.77.150.58:50923] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.702536 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.702549 2018] [proxy_fcgi:error] [pid 974] [client 10.76.1.216:58741] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.703391 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.703404 2018] [proxy_fcgi:error] [pid 974] [client 10.86.150.85:49674] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.704294 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.704307 2018] [proxy_fcgi:error] [pid 974] [client 10.94.151.240:61237] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.708168 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.708181 2018] [proxy_fcgi:error] [pid 974] [client 10.120.153.201:53910] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.722465 2018] [proxy:error] [pid 974] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.722485 2018] [proxy_fcgi:error] [pid 974] [client 10.84.150.64:53527] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:21.726900 2018] [proxy:error] [pid 976] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Fri Mar 09 10:43:21.726916 2018] [proxy_fcgi:error] [pid 976] [client 10.119.151.67:63447] AH01079: failed to make connection to backend: 127.0.0.1 [Fri Mar 09 10:43:29.286285 2018] [mpm_prefork:error] [pid 971] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
-
Looks like encryption broke again. I am starting to think its a time issue. Because we have so many hosts and we have to set checkin time so long that when we hit a peak in the day we break the encryption window of 30min (I think I heard before) then the clients all freakout.
-
So, worked remotely and was able to help determine part of the problem.
Installed apachetop and took a look, found a lot of things still looking at servicemodule-active.php. We moved away from this call as it had some weird issues on its own. Moved the file and the GUI become much less sluggish. Worked to get phpmyadmin working as well. This wasn’t a problem with php-fpm vs. libapache2-mod-php (though I did make sure phpmyadmin would access using the php-fpm system). Part of the problem that was happening was the rewrite rules that handles api system.
Moved the rewrite rules into the directory stanza of the fog.conf. Made this update in the working-1.5.1 branch as well. I haven’t had time to test whether this move actually still allows the system to work, but it should. Will work on that this weekend (testing to see it works properly).
Also, we are still working to help purge the db of old stuff that’s no longer needed.