Fog SVN 5020 and above CPU Hammered thread.
-
Yes, this was just after updating.
-
@Joseph-Hales Are you using the legacy client or the new client?
I’m going to attempt to replicate your issue.
-
Running strace on an apache process give me this:\
Process 28758 attached - interrupt to quit
^CProcess 28758 detached
% time seconds usecs/call calls errors syscall
28.32 0.002985 1 2262 poll
18.47 0.001947 0 26340 getdents
18.10 0.001908 4 505 brk
13.05 0.001375 0 32536 90 lstat
7.50 0.000790 0 31140 399 stat
4.72 0.000497 0 11704 open
2.57 0.000271 0 9684 8663 access
2.25 0.000237 0 11723 close
1.48 0.000156 0 11107 33 read
0.73 0.000077 0 10781 lseek
0.73 0.000077 0 771 sendto
0.72 0.000076 0 9307 fstat
0.30 0.000032 0 893 munmap
0.28 0.000030 2 16 write
0.28 0.000029 2 16 writev
0.27 0.000028 0 2246 recvfrom
0.23 0.000024 0 893 mmap
0.00 0.000000 0 15 rt_sigaction
0.00 0.000000 0 15 rt_sigprocmask
0.00 0.000000 0 16 pwrite
0.00 0.000000 0 77 setitimer
0.00 0.000000 0 16 accept
0.00 0.000000 0 16 shutdown
0.00 0.000000 0 16 getsockname
0.00 0.000000 0 32 semop
0.00 0.000000 0 79 fcntl
0.00 0.000000 0 15 flock
0.00 0.000000 0 60 getcwd
0.00 0.000000 0 31 chdir
0.00 0.000000 0 15 getuid
0.00 0.000000 0 16 epoll_wait
100.00 0.010539 162343 9185 total
This was done for 30 seconds.
The err is access make me look at why access is uphappy. See next post.
-
These are some of the access_httpd.log files are not there:
access(“/var/www/html/fog/lib/plugins/ldap/reg-task/Template.class.php”, F_OK) = -1 ENOENT (No such file or directory)
access(“/var/www/html/fog/lib/plugins/ldap/service/Template.class.php”, F_OK) = -1 ENOENT (No such file or directory)Something is looking for alot of files that are not there… There was 9727 requests for files that don’t exist.
-
From this point on, I’m only going to post what is asked of me to post because i don’t want to flood the thread
I just found it odd that there was that many requests for missing files.
Adam
-
I’m fairly certain that the FOG Dashboard is causing the high CPU usage.
after moving the dashboard’s js file and refreshing the FOG Dashboard, I saw a 0.02% 1-minute CPU Load average while sitting on the FOG Dashboard page.
mv /var/www/html/fog/management/js/fog/fog.dashboard.js /var/www/html/fog/management/js/fog/fog.dashboard.js.moved
After putting it back and refreshing the FOG Dashboard, I saw a 0.24% 1-minute CPU Load average while sitting on the FOG Dashboard page.
mv /var/www/html/fog/management/js/fog/fog.dashboard.js.moved /var/www/html/fog/management/js/fog/fog.dashboard.js
Opening three tabs with the file in the correct place, the 1-minute CPU Load average was at 0.96%
Also, each FOG Dashboard page creates approximately 2 httpd processes. So with 3 pages open and sitting on the dashboard, I was seeing about 6 httpd instances in
top
. So, my hypothesis is that organizations with multiple FOG users under 1 main server probably have the FOG Dashboard open all the time, causing the load.I’m not saying they should close the dashboard or not leave it open all day… it’s probably a recent code change somewhere on the dashboard causing this new issue…
SO… if you all could please test, and just temporarily move the file and then tell everybody to refresh their FOG tabs, just see what the CPU does and please report back.
-
@Wayne-Workman Interesting catch we might have to look into as well. On the other hand I am pretty sure this is not causing the high load on those servers with a lot of clients in the environment. On Joseph Hales’ server we see a lot of apache processes spawned and kept open by clients. Those clients never request Javascript files AFAIK.
I will be setting up an older revision with Joseph later today and I really hope to see a difference. We then step up the revisions to see where it all started. Keep your fingers crossed.
-
I know for certain it started somewhere between r4960 and r4982.
-
For what it’s worth, there are about 500ish PCs here in my building that are using the legacy client, and I’m using the latest as of yesterday and I’m not seeing the large CPU loads as everyone else is…
Also, nobody here but me and my co-worker even use the fog web interface, and we don’t leave it sitting open either.
-
@Wayne-Workman Even if the web interface isn’t used by anyone the load will always be high.
However, out of the ~1000 clients I have approximately 900 have the new client installed.
Maybe this issue is related to these newest revisions with regard to communication with the new client (v0.95).
-
Can all update? My suspicion as to a part of the problem is specific to the fog multicast service. I noticed things were not starting for it and would continuously be respawned. This would create quite a high load on the system as it would be constantly creating new sessions and spawning new instances. While I’m sure there may be some cases of weirdness I think this may be a largely attributing factor.
-
I am still seeing the issue after updating and restarting SVN 5106. Also after updating to 5108.
-
Still seeing the issue too. Note the huge number of threads and tasks!
-
Guys,
Do me a favor, it’s going to break things, but it’s very specific.
It’s the service module active checks. They’re spawning way too many processes.
For now, just run: mv /var/www/fog/service/servicemodule-active.php /var/www/fog/service/service-active.php
-
Well it broke it but it stopped the “thing”!
-
@Trevelyan I’m aware of it breaking the client stuff, but it’s all i can do for right now.
I’m remoted into another system having the same type of issues and trying to use it as a way to figure out exactly what’s up. You can image, and do all that, just no FOG Client stuff at the moment. I’ll be trying to narrow exactly where it is.
-
While it’s not perfect, I found 4168 is still good and 4169 things are “decent”. 4170, all hell breaks loose. Please update back to 4168 to be “functional”.
You can do this with svn by
cd /opt/trunk;svn up -r 4168;cd bin;./installfog.sh
Of course change the /opt/trunk path to the location of your svn folder.
-
4168 == sanity!
Never get past about 30% now.
Adam
-
Ok im functional at 4168 just a note all the snapins that weren’t showing up are all there now.
-
I’m now fairly certain these cpu load issues is now fixed. It was due to HookManager. I’m sorry it took so long to figure out.