HIGH CPU Fog Services after update r5029 v6759
-
@Wayne-Workman And all, I made a rather significant change (though outside it shouldn’t matter), to hopefully make an attempt at figuring this out.
Basically please try out the latest. First thing I noticed was a very similar result on my storage nodes, (one that is Ubuntu 15, and the other that is Fedora 23) and I found that my particular issue was due to the service sleep time being parsed as a string rather than an integer. This would cause the FOG Services to keep cycling (after initial reboot) probably due to improper connection finding. I’m hoping this is fixed but also a much more performance enhanced FOG server capability.
-
@Tom-Elliott Thanks Tom. Have these changes been pushed to Git too? That’s what I’m using…
-
@baggar11 Whenever I push, I automatically push to both svn and git. I’m not expecting any miracles though, but would be nice to know if I’m at least kind of on the right track.
-
I’m still seeing the issue on the newly pushed 6775. ImageReplicator, MulticastManager and SnapinReplicator seem to be taking all of the cpu load at around 31.6%
ps aux | grep FOG root 814 31.6 2.3 196196 23976 ? R 15:39 0:44 /usr/bin/php -q /opt/fog/service/FOGImageReplicator/FO ImageReplicator root 846 31.6 2.3 196244 24116 ? R 15:39 0:44 /usr/bin/php -q /opt/fog/service/FOGMulticastManager/FOGMulticastManager root 869 31.5 2.3 196196 24068 ? R 15:39 0:44 /usr/bin/php -q /opt/fog/service/FOGSnapinReplicator/FOGSnapinReplicator root 916 0.0 1.8 274228 19244 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGImageReplicator/FO ImageReplicator root 918 0.0 1.9 274404 19344 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGMulticastManager/FOGMulticastManager root 939 0.0 1.8 274096 19172 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGSnapinReplicator/FOGSnapinReplicator root 1001 0.1 2.3 196200 24092 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGTaskScheduler/FOGTaskScheduler root 1031 0.1 2.3 196200 24136 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGPingHosts/FOGPingHosts root 1127 0.0 1.9 274696 19620 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGTaskScheduler/FOGTaskScheduler root 1150 0.0 1.9 274436 19652 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGPingHosts/FOGPingHosts
-
@baggar11 see how the G is missing?
/opt/fog/service/FOGImageReplicator/FO ImageReplicator
Wonder if that is significant…
-
@Wayne-Workman My screen wrapped in my terminal. The “G” is in fact there when I full screen and grep the ps output again.
-
@baggar11 Can you give us the last 500 or so Image Replicator logs? FOG Configuration -> LOG Viewer -> Image Replicator.
Also, you might just look through all the logs in there for anything out of place.
-
@baggar11 does ubuntu 14.04 use systemctl?
Is there anything spewing into your /etc/apache2/error.log
-
-
@Wayne-Workman Here you go. 500 lines of logs.
EDIT: Forgot to select Image Replicator. Here it is.
-
@Tom-Elliott Here’s what I’m seeing after stopping all services, clearing the logs and then rebooting.
cat error.log [Wed Mar 16 16:10:53.258602 2016] [mpm_prefork:notice] [pid 946] AH00163: Apache/2.4.18 (Ubuntu) OpenSSL/1.0.2g configured -- resuming normal operations [Wed Mar 16 16:10:53.258795 2016] [core:notice] [pid 946] AH00094: Command line: '/usr/sbin/apache2' [Wed Mar 16 16:11:35.735081 2016] [:error] [pid 950] [client 192.168.10.15:53300] PHP Strict Standards: Only variables should be passed by reference in /var/www/fog/lib/pages/dashboardpage.class.php on line 71, referer: http://192.168.10.14/fog/management/index.php
-
@baggar11 What services are using the bulk of the CPU?
-
@Tom-Elliott For a full view of processes, I gave an output of ps aux | grep FOG down below. These are the top 3 though.
FOGImageReplicator
FOGMulticastManager
FOGSnapinReplicator -
@baggar11 And they’re constantly cycling CPU?
-
@Tom-Elliott Uptime is now at 17 minutes and nothing has changed in CPU cycle regard.
-
One thing that I noted was interesting was after an upgrade, cpu cycles are normal. System is idling. FOG WebUI works fine, no issues.
Once I restart the virtual machine and the FOG services start up on boot, that’s when those 3 services start consuming all of the cpu.
-
@baggar11 Well, there’s 5 total services.
That said, it seems this is somewhat well known about, and our method to “check” and correct isn’t working properly. It only seems, to me, to impact Ubuntu (Maybe older versions of Debian?)
I know the problem, in the past, was related to the network (maybe other required services) aren’t running when the FOG Services start up.
Maybe try adding a sleep 30, then manually start the services by the use of the rc.local file.
Basically the rc.local is processed LAST and by that time, network may still not be up, so the sleep time just ensure’s it has ample time to come up.
Try disabling the FOG Services from starting at boot.
Then, edit the /etc/rc.local file.
Add:
sleep 30 /etc/init.d/FOGPingHosts stop /etc/init.d/FOGScheduler stop /etc/init.d/FOGImageReplicator stop /etc/init.d/FOGSnapinReplicator stop /etc/init.d/FOGMulticastManager stop /etc/init.d/FOGPingHosts start /etc/init.d/FOGScheduler start /etc/init.d/FOGImageReplicator start /etc/init.d/FOGSnapinReplicator start /etc/init.d/FOGMulticastManager start
This should put the service files to a point where they’re not constantly looping (which is what’s most likely causing the CPU load in the first place.
-
Sorry guys! I have been rolling the dev version for so long without issue, I totally forgot about the startup services possibly being an issue. I’m all set now. Here’s what I did.
Ran this on each service
sudo update-rc.d FOG*service* disable
Then, as suggested, added these line to my rc.local
sleep 5 service FOG*service* start
-
@baggar11 Very nice.
-
@Tom-Elliott Upgrade to latest pull r5037
server is fine if i start the services one at a timebut both storage nodes have high CPU
and i have started them one at a time to check
as soon as i start one it jumps to like 98% and levels out at about 50%top - 07:35:37 up 16:54, 2 users, load average: 4.40, 2.95, 3.82 Tasks: 219 total, 5 running, 214 sleeping, 0 stopped, 0 zombie %Cpu(s): 69.7 us, 30.1 sy, 0.0 ni, 0.1 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem: 7735460 total, 1368652 used, 6366808 free, 107516 buffers KiB Swap: 7828476 total, 0 used, 7828476 free. 824568 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8183 root 20 0 41920 20308 15328 R 91.6 0.3 1:38.30 FOGMulticastMan 8212 root 20 0 41920 20692 15712 R 91.6 0.3 1:38.56 FOGSnapinReplic 8239 root 20 0 41920 20584 15604 R 82.7 0.3 1:28.03 FOGPingHosts 8226 root 20 0 41920 20132 15156 R 70.1 0.3 1:34.14 FOGTaskSchedule 8197 root 20 0 41920 20140 15160 S 63.7 0.3 1:36.08 FOGImageReplica 1 root 20 0 4612 3872 2616 S 0.0 0.1 0:01.52 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:03.83 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 0:27.75 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:00.09 watchdog/0 11 root rt 0 0 0 0 S 0.0 0.0 0:00.10 watchdog/1 12 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 13 root 20 0 0 0 0 S 0.0 0.0 0:00.15 ksoftirqd/1 14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0 15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H 16 root rt 0 0 0 0 S 0.0 0.0 0:00.10 watchdog/2 17 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/2