HIGH CPU Fog Services after update r5029 v6759
-
@Wayne-Workman My screen wrapped in my terminal. The “G” is in fact there when I full screen and grep the ps output again.
-
@baggar11 Can you give us the last 500 or so Image Replicator logs? FOG Configuration -> LOG Viewer -> Image Replicator.
Also, you might just look through all the logs in there for anything out of place.
-
@baggar11 does ubuntu 14.04 use systemctl?
Is there anything spewing into your /etc/apache2/error.log
-
-
@Wayne-Workman Here you go. 500 lines of logs.
EDIT: Forgot to select Image Replicator. Here it is.
-
@Tom-Elliott Here’s what I’m seeing after stopping all services, clearing the logs and then rebooting.
cat error.log [Wed Mar 16 16:10:53.258602 2016] [mpm_prefork:notice] [pid 946] AH00163: Apache/2.4.18 (Ubuntu) OpenSSL/1.0.2g configured -- resuming normal operations [Wed Mar 16 16:10:53.258795 2016] [core:notice] [pid 946] AH00094: Command line: '/usr/sbin/apache2' [Wed Mar 16 16:11:35.735081 2016] [:error] [pid 950] [client 192.168.10.15:53300] PHP Strict Standards: Only variables should be passed by reference in /var/www/fog/lib/pages/dashboardpage.class.php on line 71, referer: http://192.168.10.14/fog/management/index.php
-
@baggar11 What services are using the bulk of the CPU?
-
@Tom-Elliott For a full view of processes, I gave an output of ps aux | grep FOG down below. These are the top 3 though.
FOGImageReplicator
FOGMulticastManager
FOGSnapinReplicator -
@baggar11 And they’re constantly cycling CPU?
-
@Tom-Elliott Uptime is now at 17 minutes and nothing has changed in CPU cycle regard.
-
One thing that I noted was interesting was after an upgrade, cpu cycles are normal. System is idling. FOG WebUI works fine, no issues.
Once I restart the virtual machine and the FOG services start up on boot, that’s when those 3 services start consuming all of the cpu.
-
@baggar11 Well, there’s 5 total services.
That said, it seems this is somewhat well known about, and our method to “check” and correct isn’t working properly. It only seems, to me, to impact Ubuntu (Maybe older versions of Debian?)
I know the problem, in the past, was related to the network (maybe other required services) aren’t running when the FOG Services start up.
Maybe try adding a sleep 30, then manually start the services by the use of the rc.local file.
Basically the rc.local is processed LAST and by that time, network may still not be up, so the sleep time just ensure’s it has ample time to come up.
Try disabling the FOG Services from starting at boot.
Then, edit the /etc/rc.local file.
Add:
sleep 30 /etc/init.d/FOGPingHosts stop /etc/init.d/FOGScheduler stop /etc/init.d/FOGImageReplicator stop /etc/init.d/FOGSnapinReplicator stop /etc/init.d/FOGMulticastManager stop /etc/init.d/FOGPingHosts start /etc/init.d/FOGScheduler start /etc/init.d/FOGImageReplicator start /etc/init.d/FOGSnapinReplicator start /etc/init.d/FOGMulticastManager start
This should put the service files to a point where they’re not constantly looping (which is what’s most likely causing the CPU load in the first place.
-
Sorry guys! I have been rolling the dev version for so long without issue, I totally forgot about the startup services possibly being an issue. I’m all set now. Here’s what I did.
Ran this on each service
sudo update-rc.d FOG*service* disable
Then, as suggested, added these line to my rc.local
sleep 5 service FOG*service* start
-
@baggar11 Very nice.
-
@Tom-Elliott Upgrade to latest pull r5037
server is fine if i start the services one at a timebut both storage nodes have high CPU
and i have started them one at a time to check
as soon as i start one it jumps to like 98% and levels out at about 50%top - 07:35:37 up 16:54, 2 users, load average: 4.40, 2.95, 3.82 Tasks: 219 total, 5 running, 214 sleeping, 0 stopped, 0 zombie %Cpu(s): 69.7 us, 30.1 sy, 0.0 ni, 0.1 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem: 7735460 total, 1368652 used, 6366808 free, 107516 buffers KiB Swap: 7828476 total, 0 used, 7828476 free. 824568 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8183 root 20 0 41920 20308 15328 R 91.6 0.3 1:38.30 FOGMulticastMan 8212 root 20 0 41920 20692 15712 R 91.6 0.3 1:38.56 FOGSnapinReplic 8239 root 20 0 41920 20584 15604 R 82.7 0.3 1:28.03 FOGPingHosts 8226 root 20 0 41920 20132 15156 R 70.1 0.3 1:34.14 FOGTaskSchedule 8197 root 20 0 41920 20140 15160 S 63.7 0.3 1:36.08 FOGImageReplica 1 root 20 0 4612 3872 2616 S 0.0 0.1 0:01.52 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:03.83 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 0:27.75 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:00.09 watchdog/0 11 root rt 0 0 0 0 S 0.0 0.0 0:00.10 watchdog/1 12 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 13 root 20 0 0 0 0 S 0.0 0.0 0:00.15 ksoftirqd/1 14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0 15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H 16 root rt 0 0 0 0 S 0.0 0.0 0:00.10 watchdog/2 17 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/2
-
@Raymond-Bell Just going to guess but the Storage nodes, currently, are unable to talk with the main FOG Server?
-
@Tom-Elliott said:
@Raymond-Bell Just going to guess but the Storage nodes, currently, are unable to talk with the main FOG Server?
Editing the rc.local file now will let you know after reboot
-
@Tom-Elliott
After editing rc.local and rebooting all i got to web sign in and got thisDatabase Schema Installer / Updater
But i did not get this during set-up to the latest SVNI click install and get ’
Install/Upgrade Successful! The following errors occured Update ID: 1 - 0 Database Error: Too many connections, Message: Check that database is running Database SQL: CREATE DATABASE fog Update ID: 1 - 1 Database Error: Too many connections, Message: Check that database is running Database SQL: CREATE TABLE `fog`.`groupMembers` ( `gmID` int(11) NOT NULL auto_increment, `gmHostID` int(11) NOT NULL, `gmGroupID` int(11) NOT NULL, PRIMARY KEY (`gmID`), KEY `new_index` (`gmHostID`), KEY `new_index1` (`gmGroupID`) ) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
But second time login just fine and server is seeing nodes
After reboot and rc.local edit
Server
top - 07:58:10 up 8 min, 2 users, load average: 127.49, 100.92, 50.49 Tasks: 360 total, 36 running, 324 sleeping, 0 stopped, 0 zombie %Cpu(s): 65.6 us, 33.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 1.1 si, 0.0 st KiB Mem: 4355692 total, 1476112 used, 2879580 free, 95208 buffers KiB Swap: 1046524 total, 0 used, 1046524 free. 496208 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1363 root 20 0 39820 20000 15036 R 89.2 0.5 7:10.67 FOGSnapinReplic 1399 root 20 0 39820 19996 15024 S 84.5 0.5 7:10.53 FOGTaskSchedule 1254 mysql 20 0 363320 75940 9864 S 22.5 1.7 1:58.49 mysqld 3441 www-data 20 0 108088 15672 10396 D 9.6 0.4 0:01.11 apache2 3052 www-data 20 0 110464 18732 12840 R 8.9 0.4 0:06.03 apache2 2983 www-data 20 0 110820 18828 12840 R 8.3 0.4 0:05.70 apache2 3405 fog 20 0 5692 2916 2344 R 6.6 0.1 0:01.32 top 3523 www-data 20 0 108232 15724 10312 R 6.6 0.4 0:00.72 apache2 3060 www-data 20 0 111052 19272 12792 R 6.3 0.4 0:05.67 apache2 3541 www-data 20 0 107928 15280 10012 R 5.6 0.4 0:00.45 apache2
SN 1
top - 07:59:33 up 9 min, 2 users, load average: 5.07, 4.46, 2.47 Tasks: 221 total, 7 running, 214 sleeping, 0 stopped, 0 zombie %Cpu(s): 69.8 us, 29.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st KiB Mem: 7735460 total, 881520 used, 6853940 free, 79604 buffers KiB Swap: 7828476 total, 0 used, 7828476 free. 405380 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2643 root 20 0 39948 19864 14908 R 86.8 0.3 7:11.84 FOGImageReplica 2679 root 20 0 39948 20068 15116 R 86.8 0.3 7:09.10 FOGMulticastMan 2609 root 20 0 39948 19848 14888 R 76.8 0.3 7:09.03 FOGPingHosts 2626 root 20 0 39948 19888 14928 R 74.8 0.3 7:09.72 FOGTaskSchedule 2662 root 20 0 39948 19984 15024 R 74.1 0.3 7:11.40 FOGSnapinReplic 41 root 20 0 0 0 0 R 0.3 0.0 0:00.17 kworker/0:1 2845 fog 20 0 5544 2752 2344 R 0.3 0.0 0:00.35 top 1 root 20 0 4740 3888 2616 S 0.0 0.1 0:01.44 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 0:00.29 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
SN 2
top - 08:00:12 up 10 min, 3 users, load average: 5.12, 4.54, 2.58 Tasks: 225 total, 7 running, 218 sleeping, 0 stopped, 0 zombie %Cpu(s): 72.2 us, 27.8 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 7735460 total, 886896 used, 6848564 free, 79612 buffers KiB Swap: 7828476 total, 0 used, 7828476 free. 405392 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2662 root 20 0 39948 19984 15024 R 88.8 0.3 7:41.96 FOGSnapinReplic 2643 root 20 0 39948 19864 14908 R 88.4 0.3 7:41.97 FOGImageReplica 2626 root 20 0 39948 19888 14928 R 76.6 0.3 7:41.53 FOGTaskSchedule 2609 root 20 0 39948 19848 14888 R 70.5 0.3 7:40.83 FOGPingHosts 2679 root 20 0 39948 20068 15116 R 69.7 0.3 7:40.03 FOGMulticastMan 2308 root 20 0 53452 6756 5804 S 5.7 0.1 0:00.33 udisksd 2845 fog 20 0 5544 2876 2344 S 0.4 0.0 0:00.45 top 2901 fog 20 0 5688 2768 2344 R 0.4 0.0 0:00.01 top 1 root 20 0 4740 3888 2616 S 0.0 0.1 0:01.44 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 0:00.31 rcu_sched
-
Still don’t understand why i have so many apache2 services running on the server either…
-
@Raymond-Bell are you disabling the services from booting completely? Then using the rc.local?
THe rc.local won’t work if it can’t stop the original tasking anyway.