FOG Server CPU usage 100%
-
@sebastian-roth It’s possible, but less likely. While a ton of machines hitting the system at the same time can cause issues, I think the communication is so minuscule that I’m going to chalk this up to constant reloading of the Database, which is likely in a HUGE state.
@Fernando-Gietz You could try testing this theory by getting a backup of the current database, then clearing out the larger table sets, taskLog, historyLog, snapinJobs, imagingLog, etc…
Things that are “logging” or have no real value in being extremely large.
-
@tom-elliott said in FOG Server CPU usage 100%:
He sort of replicated this problem by accidentally uploading a binary data file (like teamviewer installer) into the CSV importer.
But you fixed that problem by verifying that the CSV uploaded was an actual CSV.
I would recommend cleaning out the DB though. Just truncate the history table, the tasks table, etc.
-
@wayne-workman I know what was fixed, but I also know WHY it was causing issues, it wasn’t the “bad” data persay, but rather the “amount” of bad data in the database.
Sifting through 5000 shreds of paper (let’s say a sheet of paper is 1000 shreds each) is much harder than sifting through 5 pieces of paper, right?
So size may not be the issue, rather the quantity held within the size.
-
@wayne-workman The database size is not very big
+------------------------+---------------+---------------+ | Table Name | Quant of Rows | Total Size Kb | +------------------------+---------------+---------------+ | LDAPServers | 1 | 14.32 | | clientUpdates | 0 | 4.00 | | dirCleaner | 0 | 1.00 | | globalSettings | 180 | 73.30 | | greenFog | 0 | 1.00 | | groupMembers | 6881 | 546.79 | | groups | 313 | 53.92 | | history | 4268 | 1011.34 | | hookEvents | 277 | 31.68 | | hostAutoLogOut | 502 | 34.20 | | hostMAC | 7113 | 826.06 | | hostScreenSettings | 502 | 38.18 | | hosts | 7008 | 1088.59 | | imageGroupAssoc | 279 | 18.03 | | imagePartitionTypes | 12 | 3.34 | | imageTypes | 4 | 3.22 | | images | 279 | 72.12 | | imagingLog | 2537 | 273.77 | | inventory | 618 | 257.35 | | ipxeTable | 2097 | 171.71 | | keySequence | 35 | 2.79 | | moduleStatusByHost | 4932 | 403.68 | | modules | 13 | 5.66 | | multicastSessions | 181 | 26.43 | | multicastSessionsAssoc | 2130 | 154.86 | | nfsFailures | 0 | 1.00 | | nfsGroupMembers | 1 | 20.22 | | nfsGroups | 1 | 13.05 | | notifyEvents | 5 | 9.14 | | os | 12 | 3.26 | +------------------------+---------------+---------------+
This server have 7000 entries in the hosts table but only is used by 400-500 computers
-
In the develope environment I have other server, in this one I deleted 6500 entries and It works better.
+------------------------+---------------+---------------+ | Table Name | Quant of Rows | Total Size Kb | +------------------------+---------------+---------------+ | LDAPServers | 1 | 14.32 | | clientUpdates | 0 | 1.00 | | dirCleaner | 0 | 1.00 | | globalSettings | 183 | 73.87 | | greenFog | 0 | 1.00 | | groupMembers | 279 | 534.84 | | groups | 28 | 48.79 | | history | 25895 | 5894.44 | | hookEvents | 278 | 31.72 | | hostAutoLogOut | 108 | 12.11 | | hostMAC | 6938 | 787.91 | | hostScreenSettings | 109 | 13.51 | | hosts | 480 | 1036.53 | | imageGroupAssoc | 292 | 18.15 | | imagePartitionTypes | 12 | 3.34 | | imageTypes | 4 | 3.22 | | images | 292 | 71.45 | | imagingLog | 1735 | 165.28 | | inventory | 259 | 99.80 | | ipxeTable | 1620 | 123.98 | | keySequence | 35 | 2.79 | | moduleStatusByHost | 6097 | 489.08 | | modules | 13 | 5.66 | | multicastSessions | 1208 | 138.77 | | multicastSessionsAssoc | 24929 | 1417.48 | | nfsFailures | 0 | 1.00 | | nfsGroupMembers | 1 | 12.18 | | nfsGroups | 1 | 7.05 | | notifyEvents | 5 | 9.14 | | os | 12 | 3.26 | +------------------------+---------------+---------------+
-
@fernando-gietz said in FOG Server CPU usage 100%:
The problem is the apache and his activity, it reaches to 100% of CPU.
Are you using image replication?
-
@wayne-workman No. I have only one server and one node. But the service is running.
-
@fernando-gietz Turn it off please and see what happens.
systemctl stop FOGImageReplicator
or something like that. -
@wayne-workman Nothing. the behavour is the same. I have restarted the apache too.
-
@fernando-gietz Please read through this: https://stackoverflow.com/questions/32006559/show-which-php-scripts-apache-is-currently-running
We need to figure out what’s consuming all of your CPU. You say it’s httpd, but we don’t know anything further than that. Using the methods described in the thread above, you should be able to pin-point exactly which PHP script is consuming the CPU.
Thread from stackoverflow copy/pasted below:
You can get the current working directory and opened files using lsof. Sadly this does not include the script being run, but if scripts are in different directories, or open different files, you can distinguish between them. Eg the following shows a script in /var/www/stuff/php is running:
sudo lsof -c http | grep cwd httpd 24475 id cwd DIR 8,3 4096 1123403 /var/www/stuff/php
You can configure apache to let you have the information. Add an entry like
<Location /server-status> SetHandler server-status ... </Location>
in your conf ensuring it is only made available to restricted remotes. Then browse to http://localhost/server-status and you will have a “ps” of running jobs. For example, this output shows gg.php is running (I’ve removed some columns):
Srv PID Acc M CPU SS Slot Client VHost Request 0-0 23688 0/0/262 W 0.50 2 4.48 ::1 xxx:80 GET /php/gg.php HTTP/1.1 1-0 23743 0/0/187 W 0.00 0 3.30 ::1 xxx:80 GET /server-status HTTP/1.1
-
@fernando-gietz There is also the
apachetop
command that you might be able to use too. I almost forgot about this article but this shows how to use it: https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_Web_Interface Along with a list of historic threads where people have had high CPU problems in the past. (if we solve yours, it will be added to the list too). -
@fernando-gietz I’m going to guess the tasks table is a bit on the high side maybe?
-
Done.
[root@fog7 conf]# lsof -c http | grep cwd httpd 13036 root cwd DIR 253,0 4096 128 / httpd 13037 apache cwd DIR 253,2 133 23069131 /var/www/html/fog/management httpd 13038 apache cwd DIR 253,0 4096 128 / httpd 13039 apache cwd DIR 253,0 4096 128 / httpd 13040 apache cwd DIR 253,0 4096 128 / httpd 13041 apache cwd DIR 253,0 4096 128 / httpd 13044 apache cwd DIR 253,2 4096 8388946 /var/www/html/fog/status httpd 13118 apache cwd DIR 253,0 4096 128 / httpd 13144 apache cwd DIR 253,0 4096 128 / httpd 13145 apache cwd DIR 253,0 4096 128 /
And now the output of server status
Server Version: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips PHP/5.6.31 Server MPM: prefork Server Built: Jul 26 2017 04:45:44 Current Time: Tuesday, 14-Nov-2017 19:49:35 CET Restart Time: Tuesday, 14-Nov-2017 19:44:15 CET Parent Server Config. Generation: 1 Parent Server MPM Generation: 0 Server uptime: 5 minutes 20 seconds Server load: 1.01 0.85 0.65 Total accesses: 557 - Total Traffic: 3.4 MB CPU Usage: u167.78 s11.08 cu0 cs0 - 55.9% CPU load 1.74 requests/sec - 10.8 kB/second - 6.2 kB/request 3 requests currently being processed, 6 idle workers W____WW__....................................................... ................................................................ ................................................................ ................................................................ Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Reading Request, "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup, "C" Closing connection, "L" Logging, "G" Gracefully finishing, "I" Idle cleanup of worker, "." Open slot with no current process Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request 0-0 13037 0/58/58 W 5.22 120 0 0.0 0.19 0.19 158.227.4.135 10.0.15.4:80 GET /fog/management/index.php?node=group&sub=membership&id=334 1-0 13038 0/75/75 _ 6.88 2 107 0.0 0.27 0.27 158.227.138.124 10.0.15.4:80 GET /fog/management/index.php?sub=requestClientInfo&configure&n 2-0 13039 0/76/76 _ 6.59 2 0 0.0 0.23 0.23 158.227.4.135 10.0.15.4:80 GET /server-status HTTP/1.1 3-0 13040 0/39/39 _ 4.78 2 232 0.0 0.07 0.07 158.227.138.124 10.0.15.4:80 GET /fog/management/index.php?sub=requestClientInfo&mac=94:57:A 4-0 13041 0/75/75 _ 6.74 1 150 0.0 0.23 0.23 158.227.115.42 10.0.15.4:80 GET /fog/management/index.php?node=task&sub=active&_=1510657597 5-0 13044 0/19/19 W 131.09 87 0 0.0 1.69 1.69 158.227.4.135 10.0.15.4:80 POST /fog/status/getservertime.php HTTP/1.1 6-0 13118 0/72/72 W 7.25 0 0 0.0 0.23 0.23 158.227.4.135 10.0.15.4:80 GET /server-status HTTP/1.1 7-0 13144 0/72/72 _ 6.11 1 73 0.0 0.23 0.23 158.227.138.124 10.0.15.4:80 GET /fog/service/getversion.php?newService&json HTTP/1.1 8-0 13145 0/71/71 _ 4.20 2 75 0.0 0.24 0.24 158.227.138.124 10.0.15.4:80 GET /fog/service/getversion.php?clientver&newService&json HTTP/ Srv Child Server number - generation PID OS process ID Acc Number of accesses this connection / this child / this slot M Mode of operation CPU CPU usage, number of seconds SS Seconds since beginning of most recent request Req Milliseconds required to process most recent request Conn Kilobytes transferred this connection Child Megabytes transferred this child Slot Total megabytes transferred this slot SSL/TLS Session Cache Status: cache type: SHMCB, shared memory: 512000 bytes, current entries: 0 subcaches: 32, indexes per subcache: 88 index usage: 0%, cache usage: 0% total entries stored since starting: 0 total entries replaced since starting: 0 total entries expired since starting: 0 total (pre-expiry) entries scrolled out of the cache: 0 total retrieves since starting: 0 hit, 0 miss total removes since starting: 0 hit, 0 miss
-
/fog/management/index.php?node=group&sub=membership&id=334
-
Actually this line is using a ton of CPU:
131.09 87 0 0.0 1.69 1.69 158.227.4.135 10.0.15.4:80 POST /fog/status/getservertime.php HTTP/1.1
131 is a LOT.
-
@tom-elliott Tasks table?
-
@fernando-gietz You should have a table called tasks in MySQL.
-
@wayne-workman said:
Actually this line is using a ton of CPU:
131.09 87 0 0.0 1.69 1.69 158.227.4.135 10.0.15.4:80 POST /fog/status/getservertime.php HTTP/1.1
131 is a LOT.
Yeah that looks like there is something strange going on with this script. Though I have no idea what could be causing this. On the other hand the PHP code in that file looks kind of like it might take more time than usual. I guess only @Tom-Elliott can tell us what’s those calls to
ignore_user_abort()
andset_time_limit()
are used for. -
@Tom-Elliott about the tasks table:
MariaDB [fog]> show table status from fog where Name = 'tasks'; +-------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+--------------------+---------+ | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | +-------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+--------------------+---------+ | tasks | MyISAM | 10 | Dynamic | 5217 | 125 | 653940 | 281474976710655 | 602112 | 0 | 5225 | 2017-06-14 18:23:02 | 2017-11-15 13:10:31 | 2017-09-07 15:31:22 | utf8_general_ci | NULL | row_format=DYNAMIC | | +-------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+--------------------+---------+
-
@fernando-gietz Truncate it.