Fog SVN 5020 and above CPU Hammered thread.
-
I am also still seeing this on this mornings SVN 5082 on ubuntu 14.
It does not affect the storage node whose cpu load is around 1. Here is TOP of the main server.
lsof -i :80 output
lsof output.txtapache -v output
jhales@934fog1:~$ apache2 -V [Tue Oct 27 13:18:00.309460 2015] [core:warn] [pid 21052] AH00111: Config variable ${APACHE_LOCK_DIR} is not defined [Tue Oct 27 13:18:00.309558 2015] [core:warn] [pid 21052] AH00111: Config variable ${APACHE_PID_FILE} is not defined [Tue Oct 27 13:18:00.309593 2015] [core:warn] [pid 21052] AH00111: Config variable ${APACHE_RUN_USER} is not defined [Tue Oct 27 13:18:00.309610 2015] [core:warn] [pid 21052] AH00111: Config variable ${APACHE_RUN_GROUP} is not defined [Tue Oct 27 13:18:00.309633 2015] [core:warn] [pid 21052] AH00111: Config variable ${APACHE_LOG_DIR} is not defined [Tue Oct 27 13:18:00.321367 2015] [core:warn] [pid 21052] AH00111: Config variable ${APACHE_RUN_DIR} is not defined [Tue Oct 27 13:18:00.321865 2015] [core:warn] [pid 21052] AH00111: Config variable ${APACHE_LOG_DIR} is not defined [Tue Oct 27 13:18:00.322102 2015] [core:warn] [pid 21052] AH00111: Config variable ${APACHE_LOG_DIR} is not defined [Tue Oct 27 13:18:00.322130 2015] [core:warn] [pid 21052] AH00111: Config variable ${APACHE_LOG_DIR} is not defined AH00526: Syntax error on line 74 of /etc/apache2/apache2.conf: Invalid Mutex directory in argument file:${APACHE_LOCK_DIR}
mysqladmin output.
-
mytop output
-
Tools that might be interesting (work in progress):
- Check the output of
apache2 -V
orhttpd -V
for “Server MPM”, is it Worker or Prefork?? - Apache server status http://fog-ip/server-status (might be enabled already but only accessable from localhost - check out the apache mod-configs)
- Mysql logging, a good talk about this: http://stackoverflow.com/questions/568564/how-can-i-view-live-mysql-queries
mysqladmin -u root -p -i 1 processlist;
- Tools like mytop and mtop (http://www.tecmint.com/mysql-performance-monitoring/)
- Check the output of
-
new symptom when I create an all snapins task the task is created but no snapins are queued.
-
This might be the error in apache when it happens hard to tell.
[Tue Oct 27 13:39:30.293918 2015] [:error] [pid 2548] [client 10.200.50.158:50298] PHP Warning: trim() expects parameter 1 to be string, array given in /var/www/html/fog/lib/fog/FOGController.class.php on line 193 [Tue Oct 27 13:39:32.665850 2015] [:error] [pid 2548] [client 10.200.50.158:50298] PHP Warning: trim() expects parameter 1 to be string, array given in /var/www/html/fog/lib/fog/FOGController.class.php on line 193 [Tue Oct 27 13:40:59.185332 2015] [:error] [pid 2596] [client 10.200.50.139:53802] PHP Warning: trim() expects parameter 1 to be string, array given in /var/www/html/fog/lib/fog/FOGController.class.php on line 193 [Tue Oct 27 13:41:00.465521 2015] [:error] [pid 2596] [client 10.200.50.139:53802] PHP Warning: trim() expects parameter 1 to be string, array given in /var/www/html/fog/lib/fog/FOGController.class.php on line 193
-
As all I have are theories, can you all update? I am fairly sure things aren’t 100%, but I’m hoping i’ve made SOME progress cause I’m really hitting it blind.
-
httpd -V
Server version: Apache/2.2.15 (Unix)
Server built: Aug 18 2015 02:00:22
Server’s Module Magic Number: 20051115:25
Server loaded: APR 1.3.9, APR-Util 1.3.9
Compiled using: APR 1.3.9, APR-Util 1.3.9
Architecture: 64-bit
Server MPM: Prefork
threaded: no
forked: yes (variable process count)Server-status
404 not found
mysqladmin -u root -p -i 1 processlist;
±------±-----------±-------------------±----±--------±-----±------±-----------------+
| Id | User | Host | db | Command | Time | State | Info |
±------±-----------±-------------------±----±--------±-----±------±-----------------+
| 11 | root | localhost:51961 | fog | Sleep | 75 | | |
| 14 | root | localhost:51964 | fog | Sleep | 548 | | |
| 15 | root | localhost:51965 | fog | Sleep | 393 | | |
| 2853 | fogstorage | classroomfog:52337 | fog | Sleep | 231 | | |
| 3019 | fogstorage | 10.32.8.220:36559 | fog | Sleep | 182 | | |
| 57280 | root | localhost:53739 | fog | Sleep | 2 | | |
| 57307 | root | localhost:53780 | fog | Sleep | 0 | | |
| 57328 | root | localhost:53833 | fog | Sleep | 0 | | |
| 57330 | root | localhost:53839 | fog | Sleep | 1 | | |
| 57334 | root | localhost:53843 | fog | Sleep | 2 | | |
| 57338 | root | localhost:53847 | fog | Sleep | 1 | | |
| 57339 | root | localhost:53848 | fog | Sleep | 1 | | |
| 57350 | root | localhost:53859 | fog | Sleep | 1 | | |
| 57364 | root | localhost:53881 | fog | Sleep | 0 | | |
| 57366 | root | localhost:53883 | fog | Sleep | 2 | | |
| 57367 | root | localhost:53885 | fog | Sleep | 2 | | |
| 57368 | root | localhost:53886 | fog | Sleep | 0 | | |
| 57376 | root | localhost:53894 | fog | Sleep | 0 | | |
| 57381 | root | localhost:53908 | fog | Sleep | 0 | | |
| 57383 | root | localhost:53910 | fog | Sleep | 0 | | |
| 57387 | root | localhost:53914 | fog | Sleep | 1 | | |
| 57395 | root | localhost:53925 | fog | Sleep | 0 | | |
| 57398 | root | localhost:53966 | fog | Sleep | 2 | | |
| 57399 | root | localhost:53967 | fog | Sleep | 1 | | |
| 57400 | root | localhost:53968 | fog | Sleep | 0 | | |
| 57401 | root | localhost:53969 | fog | Sleep | 2 | | |
| 57402 | root | localhost:53970 | fog | Sleep | 2 | | |
| 57405 | root | localhost:53980 | fog | Sleep | 0 | | |
| 57406 | root | localhost | | Query | 0 | | show processlist |
| 57407 | root | localhost:53983 | fog | Sleep | 1 | | |
| 57408 | root | localhost:53984 | fog | Sleep | 1 | | |
| 57409 | root | localhost:53985 | fog | Sleep | 1 | | |
| 57410 | root | localhost:53986 | fog | Sleep | 0 | | |
| 57411 | root | localhost:53987 | fog | Sleep | 1 | | |
±------±-----------±-------------------±----±--------±-----±------±-----------------+Top
top - 15:06:50 up 1 day, 5:26, 2 users, load average: 38.21, 23.90, 20.41
Tasks: 258 total, 42 running, 215 sleeping, 0 stopped, 1 zombie
Cpu(s): 89.1%us, 10.5%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 3899360k total, 1538960k used, 2360400k free, 82128k buffers
Swap: 4194300k total, 145492k used, 4048808k free, 320464k cachedPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2027 mysql 20 0 2404m 79m 3300 S 23.9 2.1 334:03.04 mysqld
14615 apache 20 0 343m 20m 3768 R 9.2 0.5 0:05.08 httpd
14454 apache 20 0 343m 21m 3764 R 8.8 0.6 0:05.61 httpd
14457 apache 20 0 343m 20m 3772 R 8.8 0.5 0:05.60 httpd
14475 apache 20 0 343m 20m 3768 R 8.8 0.5 0:05.23 httpd
14602 apache 20 0 343m 20m 3828 R 8.8 0.5 0:04.97 httpd
14617 apache 20 0 343m 20m 3768 R 8.8 0.5 0:05.04 httpd
14216 apache 20 0 343m 21m 3768 R 8.5 0.6 0:27.19 httpd
14384 apache 20 0 343m 20m 3768 R 8.5 0.5 0:10.49 httpd
14435 apache 20 0 343m 20m 3764 R 8.5 0.5 0:05.88 httpd
14459 apache 20 0 331m 10m 3764 S 8.5 0.3 0:05.70 httpd
14460 apache 20 0 343m 20m 3704 R 8.5 0.5 0:05.41 httpd
14472 apache 20 0 343m 20m 3768 R 8.5 0.5 0:05.33 httpd
14473 apache 20 0 343m 20m 3764 R 8.5 0.5 0:05.35 httpd
14613 apache 20 0 343m 21m 3768 R 8.5 0.6 0:04.74 httpd
14672 apache 20 0 343m 21m 3764 R 8.5 0.6 0:03.01 httpd
14707 apache 20 0 343m 20m 3552 R 8.5 0.5 0:00.61 httpd
14708 apache 20 0 343m 20m 3552 R 8.5 0.5 0:00.62 httpd
14408 apache 20 0 336m 14m 3768 S 8.2 0.4 0:08.21 httpd
14696 apache 20 0 336m 14m 3764 S 8.2 0.4 0:00.82 httpd
14166 apache 20 0 343m 20m 3772 R 7.9 0.5 0:30.77 httpd
14438 apache 20 0 343m 20m 3768 R 7.5 0.5 0:05.64 httpd
14443 apache 20 0 343m 20m 3768 R 7.5 0.5 0:05.60 httpd
14476 apache 20 0 343m 20m 3764 R 7.5 0.5 0:05.38 httpd
14704 apache 20 0 343m 20m 3760 R 7.5 0.6 0:00.71 httpd
14368 apache 20 0 331m 10m 3768 S 7.2 0.3 0:10.54 httpd
14455 apache 20 0 343m 20m 3768 R 7.2 0.5 0:05.52 httpd
14356 apache 20 0 343m 21m 3768 R 6.9 0.6 0:10.81 httpd
14411 apache 20 0 336m 14m 3768 S 6.9 0.4 0:07.78 httpd
14720 apache 20 0 338m 16m 3552 R 6.9 0.4 0:00.52 httpd
14481 apache 20 0 333m 11m 3768 S 6.5 0.3 0:05.50 httpd
14717 apache 20 0 343m 20m 3552 R 6.5 0.5 0:00.55 httpd
14718 apache 20 0 343m 20m 3552 R 6.5 0.5 0:00.55 httpd
14471 apache 20 0 338m 16m 3768 R 6.2 0.4 0:05.34 httpd
14715 apache 20 0 343m 20m 3704 R 6.2 0.5 0:00.52 httpd
14719 apache 20 0 343m 20m 3544 R 6.2 0.5 0:00.50 httpd
14721 apache 20 0 336m 14m 3552 R 6.2 0.4 0:00.51 httpd
14722 apache 20 0 338m 16m 3552 R 6.2 0.4 0:00.52 httpd
14442 apache 20 0 333m 11m 3768 R 5.9 0.3 0:05.58 httpd
14485 apache 20 0 332m 11m 3768 R 5.9 0.3 0:05.33 httpd
14665 apache 20 0 334m 12m 3768 R 5.9 0.3 0:03.34 httpd
14710 apache 20 0 337m 15m 3760 R 5.9 0.4 0:00.63 httpd
14703 apache 20 0 331m 9988 3552 S 5.6 0.3 0:00.69 httpd
14365 apache 20 0 337m 16m 3768 S 5.2 0.4 0:10.44 httpd
14478 apache 20 0 332m 10m 3768 R 5.2 0.3 0:05.41 httpd
14603 apache 20 0 331m 10m 3768 S 5.2 0.3 0:04.88 httpd
14716 apache 20 0 337m 15m 3760 S 5.2 0.4 0:00.48 httpd
14428 apache 20 0 331m 10m 3768 R 4.9 0.3 0:06.22 httpd
14477 apache 20 0 331m 10m 3768 S 4.9 0.3 0:05.22 httpd
14486 apache 20 0 333m 11m 3768 R 4.9 0.3 0:05.25 httpd
14456 apache 20 0 331m 10m 3768 R 4.6 0.3 0:05.40 httpd -
@Adam-Taylor Is this after updating to the most current version which Tom uped just a few minutes ago?
-
Yes, this was just after updating.
-
@Joseph-Hales Are you using the legacy client or the new client?
I’m going to attempt to replicate your issue.
-
Running strace on an apache process give me this:\
Process 28758 attached - interrupt to quit
^CProcess 28758 detached
% time seconds usecs/call calls errors syscall
28.32 0.002985 1 2262 poll
18.47 0.001947 0 26340 getdents
18.10 0.001908 4 505 brk
13.05 0.001375 0 32536 90 lstat
7.50 0.000790 0 31140 399 stat
4.72 0.000497 0 11704 open
2.57 0.000271 0 9684 8663 access
2.25 0.000237 0 11723 close
1.48 0.000156 0 11107 33 read
0.73 0.000077 0 10781 lseek
0.73 0.000077 0 771 sendto
0.72 0.000076 0 9307 fstat
0.30 0.000032 0 893 munmap
0.28 0.000030 2 16 write
0.28 0.000029 2 16 writev
0.27 0.000028 0 2246 recvfrom
0.23 0.000024 0 893 mmap
0.00 0.000000 0 15 rt_sigaction
0.00 0.000000 0 15 rt_sigprocmask
0.00 0.000000 0 16 pwrite
0.00 0.000000 0 77 setitimer
0.00 0.000000 0 16 accept
0.00 0.000000 0 16 shutdown
0.00 0.000000 0 16 getsockname
0.00 0.000000 0 32 semop
0.00 0.000000 0 79 fcntl
0.00 0.000000 0 15 flock
0.00 0.000000 0 60 getcwd
0.00 0.000000 0 31 chdir
0.00 0.000000 0 15 getuid
0.00 0.000000 0 16 epoll_wait
100.00 0.010539 162343 9185 total
This was done for 30 seconds.
The err is access make me look at why access is uphappy. See next post.
-
These are some of the access_httpd.log files are not there:
access(“/var/www/html/fog/lib/plugins/ldap/reg-task/Template.class.php”, F_OK) = -1 ENOENT (No such file or directory)
access(“/var/www/html/fog/lib/plugins/ldap/service/Template.class.php”, F_OK) = -1 ENOENT (No such file or directory)Something is looking for alot of files that are not there… There was 9727 requests for files that don’t exist.
-
From this point on, I’m only going to post what is asked of me to post because i don’t want to flood the thread
I just found it odd that there was that many requests for missing files.
Adam
-
I’m fairly certain that the FOG Dashboard is causing the high CPU usage.
after moving the dashboard’s js file and refreshing the FOG Dashboard, I saw a 0.02% 1-minute CPU Load average while sitting on the FOG Dashboard page.
mv /var/www/html/fog/management/js/fog/fog.dashboard.js /var/www/html/fog/management/js/fog/fog.dashboard.js.moved
After putting it back and refreshing the FOG Dashboard, I saw a 0.24% 1-minute CPU Load average while sitting on the FOG Dashboard page.
mv /var/www/html/fog/management/js/fog/fog.dashboard.js.moved /var/www/html/fog/management/js/fog/fog.dashboard.js
Opening three tabs with the file in the correct place, the 1-minute CPU Load average was at 0.96%
Also, each FOG Dashboard page creates approximately 2 httpd processes. So with 3 pages open and sitting on the dashboard, I was seeing about 6 httpd instances in
top
. So, my hypothesis is that organizations with multiple FOG users under 1 main server probably have the FOG Dashboard open all the time, causing the load.I’m not saying they should close the dashboard or not leave it open all day… it’s probably a recent code change somewhere on the dashboard causing this new issue…
SO… if you all could please test, and just temporarily move the file and then tell everybody to refresh their FOG tabs, just see what the CPU does and please report back.
-
@Wayne-Workman Interesting catch we might have to look into as well. On the other hand I am pretty sure this is not causing the high load on those servers with a lot of clients in the environment. On Joseph Hales’ server we see a lot of apache processes spawned and kept open by clients. Those clients never request Javascript files AFAIK.
I will be setting up an older revision with Joseph later today and I really hope to see a difference. We then step up the revisions to see where it all started. Keep your fingers crossed.
-
I know for certain it started somewhere between r4960 and r4982.
-
For what it’s worth, there are about 500ish PCs here in my building that are using the legacy client, and I’m using the latest as of yesterday and I’m not seeing the large CPU loads as everyone else is…
Also, nobody here but me and my co-worker even use the fog web interface, and we don’t leave it sitting open either.
-
@Wayne-Workman Even if the web interface isn’t used by anyone the load will always be high.
However, out of the ~1000 clients I have approximately 900 have the new client installed.
Maybe this issue is related to these newest revisions with regard to communication with the new client (v0.95).
-
Can all update? My suspicion as to a part of the problem is specific to the fog multicast service. I noticed things were not starting for it and would continuously be respawned. This would create quite a high load on the system as it would be constantly creating new sessions and spawning new instances. While I’m sure there may be some cases of weirdness I think this may be a largely attributing factor.
-
I am still seeing the issue after updating and restarting SVN 5106. Also after updating to 5108.