HIGH CPU Fog Services after update r5029 v6759
-
@Tom-Elliott said:
@Raymond-Bell The rc.local stuff is also on the Storage Nodes?
yes i edited rc.local on Storage Nodes also
-
@Raymond-Bell Did you restart the services after adding the rc.local? I’m sorry if I’m asking obvious questions, but I really just need to make sure. Current update should help alleviate CPU usage as well though.
-
@Tom-Elliott said:
@Raymond-Bell Did you restart the services after adding the rc.local? I’m sorry if I’m asking obvious questions, but I really just need to make sure. Current update should help alleviate CPU usage as well though.
yes restarted and rebooted and also up to r5038 6777
will do it again to make sure -
@Tom-Elliott Yes same result after
sudo service FOGMulticastManager stop sudo service FOGImageReplicator stop sudo service FOGSnapinReplicator stop sudo service FOGScheduler stop sudo service FOGPingHosts stop sudo update-rc.d FOGMulticastManager disable sudo update-rc.d FOGImageReplicator disable sudo update-rc.d FOGSnapinReplicator disable sudo update-rc.d FOGScheduler disable sudo update-rc.d FOGPingHosts disable sudo service FOGMulticastManager start sudo service FOGImageReplicator start sudo service FOGSnapinReplicator start sudo service FOGScheduler start sudo service FOGPingHosts start
top - 10:48:18 up 1:48, 2 users, load average: 4.14, 4.92, 5.02 Tasks: 202 total, 6 running, 196 sleeping, 0 stopped, 0 zombie %Cpu(s): 66.6 us, 32.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st KiB Mem: 1014980 total, 966204 used, 48776 free, 19852 buffers KiB Swap: 1037308 total, 12316 used, 1024992 free. 529972 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5970 root 20 0 41920 20164 15184 R 45.6 2.0 0:07.37 FOGSnapinReplic 6026 root 20 0 41920 20392 15408 R 41.3 2.0 0:05.85 FOGPingHosts 5990 root 20 0 41920 20240 15256 R 39.6 2.0 0:06.34 FOGTaskSchedule 5950 root 20 0 41920 20488 15504 R 39.3 2.0 0:06.90 FOGImageReplica 5930 root 20 0 41920 20116 15132 R 34.3 2.0 0:05.32 FOGMulticastMan 5779 fog 20 0 11284 3900 3128 S 0.3 0.4 0:00.04 sshd 1 root 20 0 4732 3788 2548 S 0.0 0.4 0:02.83 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.78 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 0:05.27 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:00.01 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/0 11 root rt 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/1 12 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 13 root 20 0 0 0 0 S 0.0 0.0 0:00.12 ksoftirqd/1
-
Something weird. After the new update , my CPU performance from my server was going to 10-20%. After a reboot , he started again @100%
Pretty anoying
-
I just realized after going through another git pull upgrade that the CPU spike will probably happen every time since the services are reinstalled. Here’s how I fixed my problem again, hopefully in a cleaner way. After upgrading to 6791, it seems to be working so far with a couple reboot tests.
Add this to your rc.local. Adjust sleep time based on your setup. My FOG setup is virtual, so 5 seconds seems to work well.
sleep 5 service FOGImageReplicator restart service FOGMulticastManager restart service FOGPingHosts restart service FOGScheduler restart service FOGSnapinReplicator restart
-
@baggar11, @Raymond-Bell I believe I may have fixed this now.
My issue – in theory?
I am moving almost EVERYTHING to static form (where I can). This allows me to get items back without having to initiate a whole class object. The problem, the DB has to be available for FOG to read its info. The services are checking for the item before DB is established. This was totally an over site and for that I’m sorry to all.
With any luck, this issue will be gone now. Please update and let me know.
Thanks.
-
Why this might affect the CPU? It looks to the DB to get timeout values (and logs). The DB is fully operational, but it’s not initiated by fog meaning the DB is not available. It returns an int of 0 for the sleep time. Put that into an infinite loop, (there’s at least 2 that manage start/restart of the services. It is told to sleep for some period of time already within the looping I’m referring to. However it’s set to 0 (meaning 0 seconds), so it get’s stuck in an infinite loop without ever actually starting.
-
@Tom-Elliott said:
@baggar11, @Raymond-Bell I believe I may have fixed this now.
With any luck, this issue will be gone now. Please update and let me know.Just updated to 6795, removed my “restart” lines from rc.local and rebooted. CPU was pegged again. Restarting the services manually brought cpu back down to idle.
-
@baggar11 as I understand it, this is still potentially due to starting the services before its dependencies have started. This is different from the pegged on update, I think.
Can you simply try reinstall and see if it’s still happening? Or did update always work, only reboot caused these issues?
-
@Tom-Elliott said:
@baggar11 as I understand it, this is still potentially due to starting the services before its dependencies have started. This is different from the pegged on update, I think.
Can you simply try reinstall and see if it’s still happening? Or did update always work, only reboot caused these issues?
For my upgrades, anything after 6753 would peg my cpu after a reboot. During and after the upgrade process using the install.sh, everything was fine. It’s only after the reboot that the services would max out the cpu. Probably the infinite loop issue as you wrote about below.
-
@baggar11 Okay, I’ve added a sleep time of 10 seconds, just in case of these situations where the int returned is 0. We must have at least 1 second sleeptime ( I believe ), and this just isn’t happening.
The only word of caution I can think of, now, is while this might help with CPU Cycles, it will not make the FOG Services actually work properly as from Ubuntu’s standpoint the service is already running. While I could, potentially, come up with a way to restart the services more appropriately, I don’t know where to start at the moment.
-
@Tom-Elliott said:
@baggar11 Okay, I’ve added a sleep time of 10 seconds, just in case of these situations where the int returned is 0. We must have at least 1 second sleeptime ( I believe ), and this just isn’t happening.
The only word of caution I can think of, now, is while this might help with CPU Cycles, it will not make the FOG Services actually work properly as from Ubuntu’s standpoint the service is already running. While I could, potentially, come up with a way to restart the services more appropriately, I don’t know where to start at the moment.
I’m not a coder so take this with a grain of salt. The following is a similar approach to another open source project I use.
Make a script to run as a cron job every 5 minutes. On 1st boot, the script essentially checks for things like MySQL and network being up, and if all checks out starts the FOG services. All other times it runs, it would really only check for FOG services running and then exit. heh…
Of course, it’s probably a waste of time as I assume systemd will probably take care of these things more intelligently than regular init and/or upstart. Which Fedora is already using and Ubuntu’s next LTS will be running.
I really appreciate you taking a look into these things Tom. FOG is a very cool project.
-
Like Baggar is saying. Thanks for taking your time to looking at these things.
Its just anoying that it is taking 100% CPU & not working properly.
Take your time but its probably a problem for all the ubuntu 14.04 - users after one update. -
@Tom-Elliott After update to r5054 this morning
Looks like it is fixed now but i have not rebooted any of them…
Servertop - 07:57:51 up 23:03, 3 users, load average: 134.00, 122.87, 122.12 Tasks: 351 total, 32 running, 312 sleeping, 0 stopped, 7 zombie %Cpu(s): 55.3 us, 42.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 2.7 si, 0.0 st KiB Mem: 4355692 total, 3461944 used, 893748 free, 69372 buffers KiB Swap: 1046524 total, 16896 used, 1029628 free. 2231712 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17801 mysql 20 0 386360 85396 9744 S 19.5 2.0 3:07.93 mysqld 20119 www-data 20 0 111272 19556 13116 R 11.3 0.4 0:04.27 apache2 21537 www-data 20 0 0 0 0 Z 9.9 0.0 0:01.06 apache2 21239 www-data 20 0 111188 19952 13344 R 8.6 0.5 0:02.65 apache2 19002 www-data 20 0 111044 19584 13116 R 7.0 0.4 0:08.43 apache2 18759 www-data 20 0 111296 19812 13092 R 6.6 0.5 0:13.16 apache2 19062 www-data 20 0 110988 19788 13752 R 6.6 0.5 0:07.04 apache2 20017 www-data 20 0 110996 18988 12572 S 6.6 0.4 0:04.30 apache2 18845 www-data 20 0 111300 19240 12688 R 6.3 0.4 0:13.16 apache2 18875 www-data 20 0 111256 20412 13732 R 6.3 0.5 0:12.60 apache2
Storage Node Master
top - 07:58:14 up 19:25, 2 users, load average: 0.01, 0.06, 0.12 Tasks: 201 total, 1 running, 195 sleeping, 0 stopped, 5 zombie %Cpu(s): 0.2 us, 0.2 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 1014980 total, 975276 used, 39704 free, 106172 buffers KiB Swap: 1037308 total, 22532 used, 1014776 free. 443116 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 4616 3552 2432 S 0.0 0.3 0:02.85 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:03.88 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 0:26.13 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:00.02 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:00.18 watchdog/0 11 root rt 0 0 0 0 S 0.0 0.0 0:00.20 watchdog/1 12 root rt 0 0 0 0 S 0.0 0.0 0:00.02 migration/1
Storage Node 2nd
top - 07:58:31 up 22:59, 3 users, load average: 0.01, 0.02, 0.05 Tasks: 227 total, 1 running, 225 sleeping, 0 stopped, 1 zombie %Cpu(s): 0.1 us, 0.0 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 7735460 total, 7466792 used, 268668 free, 69996 buffers KiB Swap: 7828476 total, 0 used, 7828476 free. 6881904 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7 root 20 0 0 0 0 S 0.3 0.0 0:29.70 rcu_sched 11185 fog 20 0 5572 2852 2352 R 0.3 0.0 0:02.77 top 1 root 20 0 4732 3844 2580 S 0.0 0.0 0:01.60 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:04.09 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:00.01 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:00.30 watchdog/0 11 root rt 0 0 0 0 S 0.0 0.0 0:00.32 watchdog/1 12 root rt 0 0 0 0 S 0.0 0.0 0:00.01 migration/1
-
you are right. CPU is fixed indead … Till you boot
-
-
@boeleke What?
You updated and still seeing CPU load issues? See the service_lib script now implicitly defines a sleep time if one cannot be found otherwise.
-
@Tom-Elliott
thought i had the last version.
I just updated & rebooted. Now working fine for a couple of minutes.think its ok now.
-
Wanted to post another update. Updated to 6893 this morning. Removed my sleep and service stop commands from rc.local and rebooted. CPU seems to be in check again. Thanks for all your hard work!