HIGH CPU Fog Services after update r5029 v6759
-
I just updated my home FOG setup (which includes many nodes) to r6769 and I cannot replicate the issue. I’m using Fedora 23
And my replication setting is set to 60 seconds, and I’ve got the slowest setup in town (running 4 OSs on a single Core 2 duo, and P4s with 100 meg switches).
It’s either a Ubuntu thing or a New Client related thing. And I’m leaning towards it being a Ubuntu thing.
It’s also possible that there is some certain scenario that happened to cause replication to go awol, but we won’t know until we can see a setup that is affected and figure out what’s going on.
-
I was just able to test my 1 fog client system at home. it doesn’t make any difference.
I think this is a Ubuntu issue.
-
@Wayne-Workman And all, I made a rather significant change (though outside it shouldn’t matter), to hopefully make an attempt at figuring this out.
Basically please try out the latest. First thing I noticed was a very similar result on my storage nodes, (one that is Ubuntu 15, and the other that is Fedora 23) and I found that my particular issue was due to the service sleep time being parsed as a string rather than an integer. This would cause the FOG Services to keep cycling (after initial reboot) probably due to improper connection finding. I’m hoping this is fixed but also a much more performance enhanced FOG server capability.
-
@Tom-Elliott Thanks Tom. Have these changes been pushed to Git too? That’s what I’m using…
-
@baggar11 Whenever I push, I automatically push to both svn and git. I’m not expecting any miracles though, but would be nice to know if I’m at least kind of on the right track.
-
I’m still seeing the issue on the newly pushed 6775. ImageReplicator, MulticastManager and SnapinReplicator seem to be taking all of the cpu load at around 31.6%
ps aux | grep FOG root 814 31.6 2.3 196196 23976 ? R 15:39 0:44 /usr/bin/php -q /opt/fog/service/FOGImageReplicator/FO ImageReplicator root 846 31.6 2.3 196244 24116 ? R 15:39 0:44 /usr/bin/php -q /opt/fog/service/FOGMulticastManager/FOGMulticastManager root 869 31.5 2.3 196196 24068 ? R 15:39 0:44 /usr/bin/php -q /opt/fog/service/FOGSnapinReplicator/FOGSnapinReplicator root 916 0.0 1.8 274228 19244 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGImageReplicator/FO ImageReplicator root 918 0.0 1.9 274404 19344 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGMulticastManager/FOGMulticastManager root 939 0.0 1.8 274096 19172 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGSnapinReplicator/FOGSnapinReplicator root 1001 0.1 2.3 196200 24092 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGTaskScheduler/FOGTaskScheduler root 1031 0.1 2.3 196200 24136 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGPingHosts/FOGPingHosts root 1127 0.0 1.9 274696 19620 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGTaskScheduler/FOGTaskScheduler root 1150 0.0 1.9 274436 19652 ? S 15:39 0:00 /usr/bin/php -q /opt/fog/service/FOGPingHosts/FOGPingHosts
-
@baggar11 see how the G is missing?
/opt/fog/service/FOGImageReplicator/FO ImageReplicator
Wonder if that is significant…
-
@Wayne-Workman My screen wrapped in my terminal. The “G” is in fact there when I full screen and grep the ps output again.
-
@baggar11 Can you give us the last 500 or so Image Replicator logs? FOG Configuration -> LOG Viewer -> Image Replicator.
Also, you might just look through all the logs in there for anything out of place.
-
@baggar11 does ubuntu 14.04 use systemctl?
Is there anything spewing into your /etc/apache2/error.log
-
-
@Wayne-Workman Here you go. 500 lines of logs.
EDIT: Forgot to select Image Replicator. Here it is.
-
@Tom-Elliott Here’s what I’m seeing after stopping all services, clearing the logs and then rebooting.
cat error.log [Wed Mar 16 16:10:53.258602 2016] [mpm_prefork:notice] [pid 946] AH00163: Apache/2.4.18 (Ubuntu) OpenSSL/1.0.2g configured -- resuming normal operations [Wed Mar 16 16:10:53.258795 2016] [core:notice] [pid 946] AH00094: Command line: '/usr/sbin/apache2' [Wed Mar 16 16:11:35.735081 2016] [:error] [pid 950] [client 192.168.10.15:53300] PHP Strict Standards: Only variables should be passed by reference in /var/www/fog/lib/pages/dashboardpage.class.php on line 71, referer: http://192.168.10.14/fog/management/index.php
-
@baggar11 What services are using the bulk of the CPU?
-
@Tom-Elliott For a full view of processes, I gave an output of ps aux | grep FOG down below. These are the top 3 though.
FOGImageReplicator
FOGMulticastManager
FOGSnapinReplicator -
@baggar11 And they’re constantly cycling CPU?
-
@Tom-Elliott Uptime is now at 17 minutes and nothing has changed in CPU cycle regard.
-
One thing that I noted was interesting was after an upgrade, cpu cycles are normal. System is idling. FOG WebUI works fine, no issues.
Once I restart the virtual machine and the FOG services start up on boot, that’s when those 3 services start consuming all of the cpu.
-
@baggar11 Well, there’s 5 total services.
That said, it seems this is somewhat well known about, and our method to “check” and correct isn’t working properly. It only seems, to me, to impact Ubuntu (Maybe older versions of Debian?)
I know the problem, in the past, was related to the network (maybe other required services) aren’t running when the FOG Services start up.
Maybe try adding a sleep 30, then manually start the services by the use of the rc.local file.
Basically the rc.local is processed LAST and by that time, network may still not be up, so the sleep time just ensure’s it has ample time to come up.
Try disabling the FOG Services from starting at boot.
Then, edit the /etc/rc.local file.
Add:
sleep 30 /etc/init.d/FOGPingHosts stop /etc/init.d/FOGScheduler stop /etc/init.d/FOGImageReplicator stop /etc/init.d/FOGSnapinReplicator stop /etc/init.d/FOGMulticastManager stop /etc/init.d/FOGPingHosts start /etc/init.d/FOGScheduler start /etc/init.d/FOGImageReplicator start /etc/init.d/FOGSnapinReplicator start /etc/init.d/FOGMulticastManager start
This should put the service files to a point where they’re not constantly looping (which is what’s most likely causing the CPU load in the first place.
-
Sorry guys! I have been rolling the dev version for so long without issue, I totally forgot about the startup services possibly being an issue. I’m all set now. Here’s what I did.
Ran this on each service
sudo update-rc.d FOG*service* disable
Then, as suggested, added these line to my rc.local
sleep 5 service FOG*service* start