Very high CPU usage httpd, mysqld, FOGMulticastManager FOG trunk@5224
-
Hi all, due to a non-FOG related issue on a couple of FOG boxes I had to rebuild them and since doing so am encountering very high CPU usage and multiple httpd processes.
top - 13:09:42 up 22:33, 1 user, load average: 56.32, 45.19, 42.16 Tasks: 221 total, 47 running, 174 sleeping, 0 stopped, 0 zombie Cpu(s): 56.9%us, 42.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 1922092k total, 1666716k used, 255376k free, 92020k buffers Swap: 4128764k total, 2304k used, 4126460k free, 644060k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28249 mysql 20 0 1373m 59m 6588 S 11.0 3.2 1:16.55 mysqld 28903 apache 20 0 328m 15m 3840 R 4.2 0.8 0:02.84 httpd 28452 apache 20 0 421m 15m 4484 R 3.9 0.8 0:19.04 httpd 28666 apache 20 0 421m 15m 4480 R 3.9 0.8 0:13.70 httpd 28830 apache 20 0 425m 18m 4304 R 3.9 1.0 0:05.28 httpd 28838 apache 20 0 420m 14m 4392 R 3.9 0.8 0:02.58 httpd 28855 apache 20 0 424m 18m 4692 R 3.9 1.0 0:04.97 httpd 28883 apache 20 0 421m 14m 4256 R 3.9 0.8 0:03.23 httpd 28937 apache 20 0 425m 19m 4368 R 3.9 1.0 0:03.41 httpd 28966 apache 20 0 326m 13m 3872 R 3.9 0.7 0:02.60 httpd 28477 apache 20 0 422m 16m 4692 R 3.6 0.9 0:20.11 httpd 28723 apache 20 0 425m 19m 4684 R 3.6 1.1 0:11.49 httpd 28841 apache 20 0 326m 13m 3732 R 3.6 0.7 0:05.58 httpd 28885 apache 20 0 423m 16m 4056 R 3.6 0.9 0:04.08 httpd 28902 apache 20 0 425m 18m 4088 R 3.6 1.0 0:04.18 httpd 28919 apache 20 0 421m 14m 4112 S 3.6 0.8 0:03.90 httpd 28543 apache 20 0 421m 15m 4436 D 3.2 0.8 0:16.87 httpd 28846 apache 20 0 420m 14m 4416 R 3.2 0.8 0:04.00 httpd 28858 apache 20 0 420m 14m 4376 R 3.2 0.8 0:05.29 httpd 28910 apache 20 0 325m 13m 3844 S 3.2 0.7 0:04.08 httpd 28947 apache 20 0 420m 14m 4392 R 3.2 0.8 0:02.68 httpd 29022 apache 20 0 421m 14m 3752 S 3.2 0.7 0:01.36 httpd 29030 apache 20 0 420m 13m 3760 S 3.2 0.7 0:01.22 httpd 28704 apache 20 0 424m 17m 4772 R 2.9 0.9 0:14.33 httpd 28767 apache 20 0 421m 15m 4312 S 2.9 0.8 0:10.92 httpd 28848 apache 20 0 326m 14m 3892 S 2.9 0.8 0:05.33 httpd
tail /var/log/httpd/access_log
10.***.***.51 - - [20/Apr/2016:12:59:52 +0100] "GET /fog/service/servicemodule-active.php? moduleid=displaymanager&mac=A4:1F:72:85:89:F4%7C%7C00:00:00:00:00:00:00:E0&newService=1 HTTP/1.1" 200 5 "-" "-" 10.***.***.47 - - [20/Apr/2016:12:59:51 +0100] "GET /fog/service/servicemodule-active.php?moduleid=snapinclient&mac=74:27:EA:EB:14:97%7C%7C00:00:00:00:00:00:00:E0%7C00:00:00:00:00:00:00:E0&newService=1 HTTP/1.1" 200 5 "-" "-" 10.***.***.2 - - [20/Apr/2016:12:59:52 +0100] "GET /fog/service/snapins.checkin.php?mac=FC:AA:14:19:72:40%7C%7C00:00:00:00:00:00:00:E0&newService=1 HTTP/1.1" 200 4 "-" "-" 10.***.***.9 - - [20/Apr/2016:12:59:51 +0100] "GET /fog/service/servicemodule-active.php?moduleid=autologout&mac=78:45:C4:0D:EE:BD%7C%7C00:00:00:00:00:00:00:E0&newService=1 HTTP/1.1" 200 5 "-" "-" 10.***.***.4 - - [20/Apr/2016:12:59:52 +0100] "GET /fog/service/Printers.php?mac=8C:89:A5:90:59:6B%7C%7C00:00:00:00:00:00:00:E0&newService=1 HTTP/1.1" 200 12 "-" "-" 10.***.***.56 - - [20/Apr/2016:12:59:52 +0100] "GET /fog/service/servicemodule-active.php?moduleid=displaymanager&mac=FC:AA:14:12:F8:DE%7C%7C00:00:00:00:00:00:00:E0%7C00:00:00:00:00:00:00:E0&newService=1 HTTP/1.1" 200 5 "-" "-" 10.***.***.59 - - [20/Apr/2016:12:59:52 +0100] "GET /fog/service/servicemodule-active.php?moduleid=clientupdater&mac=A4:1F:72:85:87:EE%7C%7C00:00:00:00:00:00:00:E0&newService=1 HTTP/1.1" 200 5 "-" "-" 10.***.***.7 - - [20/Apr/2016:12:59:52 +0100] "GET /fog/service/servicemodule-active.php?moduleid=greenfog&mac=FC:AA:14:19:76:E2%7C%7C00:00:00:00:00:00:00:E0&newService=1 HTTP/1.1" 200 5 "-" "-" 10.***.***.31 - - [20/Apr/2016:12:59:52 +0100] "POST /fog/service/getversion.php HTTP/1.1" 200 4 "-" "-" 10.***.***.95 - - [20/Apr/2016:12:59:51 +0100] "GET /fog/service/hostname.php?moduleid=hostnamechanger&mac=74:27:EA:CE:0F:26%7C%7C00:00:00:00:00:00:00:E0&newService=1 HTTP/1.1" 200 233 "-" "-"
I suspect due to the rebuild the clients are trying to check in to the FOG server with old client version and CA cert. If I stop all 3 services (httpd, mysqld, FOGMulticastManager) and only start httpd all the processes spawn again and CPU ramps up again making FOG unusable.
Other than updating the new client/SSL combo on all my hosts is there away around this to stop the https processes maxing out the CPU? Assuming of course thats whats causing the problem…
cheers. Kiweegie.
-
If you can get copies of your old certs and CA, you should do that and install them.
You could use Group Policy to just remove all copies of the new FOG Client from all computers, it’s easy with a startup script.
Then in a few weeks you can re-deploy it via Group Policy software deployment.
-
@Wayne-Workman Thanks Wayne
well luckily I backup the 2 FOG servers in question…
Is it as simple as getting the original certs from backup and put back in the relevant file path on the new server? If so where do the certs reside in the file system?
Or is the SSL cert intrinsically linked to that server install and we need to update the clients. Looking for the path of least resistance.
cheers, Kiweegie.
-
@Kiweegie You can read the details about it here: https://wiki.fogproject.org/wiki/index.php?title=FOG_Client
But basically, from your backups, grab a copy of
/opt/fog/snapins/ssl
and put it on your new server in the same spot, and then re-run the installer as normal. -
I am wondering if this is the same issue we seem to see in environments with a lot of clients lately…?? @Tom-Elliott any news on this issue? Unfortunately I wasn’t able to follow this as my new job just started this week.
-
@Sebastian-Roth I haven’t got a clue or any status. I’ve added a bit more checking and it seems extremely random as to what causes the issue(s). I only know of one case where it’s constantly plaguing ( @Raymond-Bell ) And the others seem to do better after resetting all encryption data to all hosts.
-
@Tom-Elliott Morning Tom
I upgraded late yesterday afternoon 26Apr GMT to the latest git release:
git-svn-id: https://svn.code.sf.net/p/freeghost/code/trunk@5303 71f96598-fa45-0410-b640-bcd6f8691b32
I’m still seeing high CPU usage. top shows CPU jumping between 25-65%
top - 11:20:20 up 6 days, 19:45, 1 user, load average: 5.31, 4.77, 4.80 Tasks: 154 total, 3 running, 151 sleeping, 0 stopped, 0 zombie Cpu(s): 50.4%us, 25.3%sy, 0.0%ni, 23.8%id, 0.0%wa, 0.0%hi, 0.5%si, 0.0%st Mem: 1922092k total, 1720328k used, 201764k free, 86980k buffers Swap: 4128764k total, 5512k used, 4123252k free, 1253364k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6726 mysql 20 0 1350m 51m 6596 S 11.3 2.8 103:15.15 mysqld 30973 apache 20 0 327m 13m 3856 S 9.3 0.7 1:30.69 httpd 31311 apache 20 0 327m 13m 3856 S 9.3 0.7 0:19.28 httpd 30135 apache 20 0 328m 13m 3856 R 9.0 0.7 4:06.95 httpd 30754 apache 20 0 328m 13m 3856 S 9.0 0.7 2:12.27 httpd 31364 apache 20 0 327m 13m 3856 S 9.0 0.7 0:10.67 httpd 30861 apache 20 0 327m 13m 3856 S 8.6 0.7 1:54.65 httpd 31356 apache 20 0 328m 13m 3856 S 8.6 0.7 0:11.72 httpd 31412 apache 20 0 327m 13m 3856 R 8.3 0.7 0:01.23 httpd 30728 apache 20 0 328m 13m 3856 S 7.6 0.7 2:13.29 httpd 31361 apache 20 0 328m 13m 3856 S 7.3 0.7 0:10.60 httpd 31403 apache 20 0 327m 13m 3716 S 6.6 0.7 0:02.80 httpd 30744 apache 20 0 327m 13m 3856 S 6.3 0.7 2:12.84 httpd 31314 apache 20 0 327m 13m 3856 S 6.0 0.7 0:19.26 httpd 31363 apache 20 0 328m 13m 3856 S 6.0 0.7 0:10.76 httpd 31400 apache 20 0 328m 13m 3856 S 6.0 0.7 0:02.77 httpd 31406 apache 20 0 327m 13m 3856 S 6.0 0.7 0:02.67 httpd 31414 apache 20 0 327m 13m 3848 S 6.0 0.7 0:01.41 httpd 31006 apache 20 0 328m 13m 3856 S 4.7 0.7 1:23.89 httpd 30857 apache 20 0 327m 13m 3856 S 3.7 0.7 1:53.48 httpd 30751 apache 20 0 327m 13m 3856 S 3.3 0.7 2:11.88 httpd 9083 root 20 0 318m 24m 3528 S 1.7 1.3 2:59.69 FOGPingHosts 11 root 20 0 0 0 0 S 0.3 0.0 4:23.42 events/0
This is happening on 2 servers, FOG01 with 267 hosts, and FOG02 which has 124 hosts. FOG02 has high-ish CPU usage also but not anywhere near as high as FOG01. Both servers are identical in terms of OS and FOG versions, located in same site albeit FOG01 is serving local clients and FOG02 our remote sites. FOG02 will eventually have the WOL and location plugins installed for this resason but does not currently.
Ref removing the old client in case that’s interfering can we use any fogservice.msi installer or must it be the exact same as was installed on the client?
I am also seeing apache errors in log as follows, multiple IP addresses listed with same error
[Wed Apr 27 12:07:57 2016] [error] [client 10.166.***.***] PHP Warning: implode(): Invalid arguments passed in /var/www/html/fog/lib/fog/eventmanager.class.php on line 67
Line in question in file refers to this one
$pluginfiles = array_values(array_filter(preg_grep(sprintf('#/(%s)/#',implode('|',$_SESSION['PluginsInstalled'])),array_map($fileitems,(array)$files))));
I had plugins installed on this box previously but now do not so that could be all that means…
regards Kiweegie.
-
@Kiweegie Mind updating again?
-
@Tom-Elliott hi Tom, updated once more and still the same CPU usage - FOG01 around mid 60% max and FOG02 around mid 30% max.
git-svn-id: https://svn.code.sf.net/p/freeghost/code/trunk@5312 71f96598-fa45-0410-b640-bcd6f8691b32
Added to that I cannot access the Storage management, Report management or FOG configuration page links within the web GUI any longer… All other links are ok.
cheers Kiweegie.
-
@Kiweegie can you install
apachetop
on your fog servers, and then run it? To run it after installed, just typeapachetop
install on debian/ubuntu:
sudo apt-get install apachetop -y
install on fedora/centOS/rhel:
yum install apachetop -y
Let the app run for a while, maybe 15 or so minutes, and then give us a screenshot of what it’s showing. What’s important is the request count and data sent. We will be able to see what web file is getting the most activity and it’ll help us to narrow things down.
-
@Wayne-Workman Not a problem. Installed on both boxes and currently running - I need to run myself shortly but will fire over the results before i go in about 20 mins.
cheers, Kiweegie.
-
Some info for people experiencing high loads:
v0.10.0 of the client (currently undergoing release candidate testing) drastically cuts the amount of traffic used. In the “heaviest” cycle, the client will only make 3 requests. 1 to authenticate, 1 to get module settings, and 1 to get server settings (cycle time, client version,…). It also prevents run-away authentication, where the client had the potential to constantly spam the server’s authentication method.
Hopefully when v0.10.0 is released it should resolve any load issues. However, more than likely v0.10.0 may contain several bugs as the entire code base has changed to support any OS, and a completely new middleware API has been made to support the decrease in traffic. Since so much has changed, its almost guaranteed that I will have overlooked some minor things.
-
@Wayne-Workman Hi Wayne
here’s the output of apachetop on my main machine
last hit: 15:16:16 atop runtime: 0 days, 00:21:30 15:16:17 All: 22036 reqs ( 17.1/sec) 4721.2K ( 3747.7B/sec) 219.4B/req 2xx: 22036 ( 100%) 3xx: 0 ( 0.0%) 4xx: 0 ( 0.0%) 5xx: 0 ( 0.0%) R ( 30s): 454 reqs ( 15.1/sec) 90.0K ( 3072.1B/sec) 203.0B/req 2xx: 454 ( 100%) 3xx: 0 ( 0.0%) 4xx: 0 ( 0.0%) 5xx: 0 ( 0.0%) REQS REQ/S KB KB/S URL 210 7.00 1.0 0.0*/fog/service/servicemodule-active.php 53 1.77 86.9 2.9 /fog/management/other/ssl/srvpublic.crt 43 1.43 0.5 0.0 /fog/service/Printers.php 30 1.00 0.2 0.0 /fog/service/autologout.php 28 0.97 0.1 0.0 /fog/service/snapins.checkin.php 17 0.59 0.9 0.0 /fog/management/index.php 16 0.53 0.1 0.0 /fog/service/greenfog.php 15 0.50 0.1 0.0 /fog/service/printerlisting.php 14 0.47 0.1 0.0 /fog/service/getversion.php 14 0.48 0.1 0.0 /fog/service/jobs.php 13 0.45 0.1 0.0 /fog/service/hostname.php 1 0.04 0.0 0.0 *
and from my second machine
last hit: 15:15:47 atop runtime: 0 days, 00:21:10 15:15:48 All: 5274 reqs ( 4.2/sec) 1008.8K ( 813.4B/sec) 195.9B/req 2xx: 5259 (99.7%) 3xx: 15 ( 0.3%) 4xx: 0 ( 0.0%) 5xx: 0 ( 0.0%) R ( 30s): 190 reqs ( 6.3/sec) 50.5K ( 1722.4B/sec) 272.0B/req 2xx: 190 ( 100%) 3xx: 0 ( 0.0%) 4xx: 0 ( 0.0%) 5xx: 0 ( 0.0%) REQS REQ/S KB KB/S URL 80 2.67 0.4 0.0*/fog/service/servicemodule-active.php 30 1.07 49.2 1.8 /fog/management/other/ssl/srvpublic.crt 13 0.45 0.2 0.0 /fog/service/Printers.php 9 0.35 0.0 0.0 /fog/service/hostname.php 9 0.33 0.1 0.0 /fog/service/getversion.php 9 0.35 0.0 0.0 /fog/service/jobs.php 9 0.38 0.0 0.0 /fog/service/printerlisting.php 9 0.36 0.0 0.0 /fog/service/snapins.checkin.php 9 0.38 0.0 0.0 /fog/service/greenfog.php 8 0.35 0.4 0.0 /fog/management/index.php 5 0.17 0.0 0.0 /fog/service/autologout.php
Heading off home now but will be online later if more information required.
regards Kiweegie.
-
Ok, @Developers what does
servicemodule-active.php
do? On Kiweegie’s servers, it’s getting 7 requests per second. -
For each module the client has, it does 1 request to servicemodule-active to see if its enabled. That means, every 1 cycle of the client it’s calling that file about 5-6 times. This is one of the things the v0.10.0 removes completely.
Keep in mind the new client was originally designed for a completely different form of server<-> client communication (socket connections). And inorder to be able to release it in a timely fashion, we retrofitted the legacy client’s communication method onto the new client. This is why we have so much “useless” network traffic. v0.10.0 provides a hybrid of sorts between the ideal network communication model we originally wanted, and the legacy one.
-
Just pushed a fix to hopefully make things functional again (meaning cancelling/deleting/viewing) everything
-
@Tom-Elliott Hey Tom, many thanks. Both servers now have access to the reports, fog settings etc pages. CPU usage still high but the servers are usable enough albeit slow. I’ll keep an eye on Jbobs progress with the new client.
thanks again, Kiweegie.
-
@Tom-Elliott Hey Tom
another anomaly turned up since most recent changes. My Helpdesk team complained that all desktops with FOG client installed were boot looping. Work around was to remove all clients from the host management section in FOG. This permitted desktops to boot and not restart - suspect this is down to the old FOG client.
Only today getting time investigate this and noticed that when adding in a brand new host it seems to accept ok but nothing shows in the GUI when clicking List all hosts. If I export the list it shows one host only (I’ve added 2) but if I try and re-add the 2nd host i get error Hostname already exists.
Something funky going on with the database?
cheers Kiweegie
-
@Kiweegie go into mysql via CLI and look at the actual data, and let us see what you see.
mysql
use fog
select * from hosts;
-
@Kiweegie Lots of improvements in the last week to the new client, can you update and see if this affects the CPU load at all?
I suspect there will be a quite dramatic decrease after moving to client 010.5