Fog server keeps going down
-
@szecca1 said:
@Wayne-Workman I apologize I was trying to get a hold of my boss to log onto vSphere. When I go to the console, it asks for localhost login and then gives an error:
(38484.938838) Out of memory: kill process 2739 (httpd) score 104 or sacrifice child (38484.942704) killed process 2739 (httpd) total -vm:1247792kb, anon-rss: 412668kb, file-rss:0KB
1247792KB + 412668KB = 1660460KB = 1660.46MB = 1.24GB
How much RAM does the server have?
-
@Wayne-Workman I can’t type anything in as the screen wont allow me to even log into the machine until I reboot it
-
@Wayne-Workman We set up the server with 4096 MB
-
@Wayne-Workman But it says the memory overhead is 88MB but not sure what that means
-
@szecca1 So it’s simple… the server ran out of RAM - or - has dynamic RAM assigned and VMWare tried to smoosh it down a little to make room for other VMs that have a need… I think.
I was just reading this: https://plumbr.eu/blog/memory-leaks/out-of-memory-kill-process-or-sacrifice-child
This seems to say the same thing: http://ubuntuforums.org/showthread.php?t=2070853
-
@Wayne-Workman So you would recommend adding more Ram? What do you think I should up it to?
-
@szecca1 I couldn’t see how it’s actually using all of 4 gigs… Make sure in VMWare that fog’s ram is not set to dynamic (auto expanding and contracting).
if it is set to dynamic, you could push the memory weight higher (it’s priority).
You could try bumping it to 6 gigs… but I really really doubt that the 4 gigs it has is being used… I really think dynamic RAM is to blame.
-
@Wayne-Workman How do I see if it is set to dynamic. All I am seeing is the amount of ram nothing saying dynamic or not!
-
@szecca1 I have no idea, I don’t have any VMWare experience… but probably the other @Developers and @Moderators could chime in on that.
-
@Wayne-Workman Better question, why would this randomly just start happening. The server was working fine until recently
-
@szecca1 Ask your boss if a new VM has been created on that physical box. Ask if any changes on other VMs on that box happened around the time that the FOG server started having issues.
And maybe just bump it to 6 gigs and see what happens tomorrow morning.
-
@Wayne-Workman Ok i extended it to 6 gb and will update you tomorrow on if the server continues to do the same thing. My boss is out today so tomorrow I will discuss with him about any changes made and about dynamic RAM.
-
@szecca1 Can you please setup this crontab task? I wrote a script to monitor resource usage over time. This can prove invaluable in finding out WHEN and WHY the server is crashing.
dt="$(date)" echo ----------------------------- >> /root/monitor.log echo $dt >> /root/monitor.log echo ----------------------------- >> /root/monitor.log /usr/bin/free -m >> /root/monitor.log echo ----------------------------- >> /root/monitor.log /usr/bin/top -n 1 >> /root/monitor.log echo ----------------------------- >> /root/monitor.log echo .............................. >> /root/monitor.log echo .............................. >> /root/monitor.log echo .............................. >> /root/monitor.log
I put that script here:
/root/monitor.sh
Made it executable:
chmod +x /root/monitor.sh
Then added a crontab entry to run it every 15 minutes:
crontab -e
0,15,30,45 * * * * /root/monitor.sh
Your output should look something like this:
cat /root/monitor.sh
----------------------------- Mon Nov 30 12:42:53 CST 2015 ----------------------------- total used free shared buff/cache available Mem: 3934 409 2066 0 1458 3197 Swap: 2047 0 2047 ----------------------------- top - 12:42:53 up 29 min, 1 user, load average: 3.68, 4.22, 3.76 Tasks: 180 total, 3 running, 177 sleeping, 0 stopped, 0 zombie %Cpu(s): 29.8 us, 37.6 sy, 0.0 ni, 31.5 id, 0.2 wa, 0.0 hi, 0.9 si, 0.0 st KiB Mem : 4028768 total, 2123932 free, 411532 used, 1493304 buff/cache KiB Swap: 2097148 total, 2097148 free, 0 used. 3281768 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 559 root 20 0 49864 10204 4808 S 50.0 0.3 14:39.07 systemd-udevd 2423 root 20 0 0 0 0 D 18.8 0.0 2:08.52 kworker/0:2 55794 root 20 0 49864 8820 3420 R 18.8 0.2 0:00.03 systemd-udevd 55793 root 20 0 49864 8820 3420 S 12.5 0.2 0:00.02 systemd-udevd 35 root 20 0 0 0 0 R 6.2 0.0 1:25.50 kdevtmpfs 55783 root 20 0 160796 4376 3688 R 6.2 0.1 0:00.01 top 55795 root 20 0 49864 7324 1924 S 6.2 0.2 0:00.01 systemd-udevd 55797 root 20 0 49864 7324 1924 S 6.2 0.2 0:00.01 systemd-udevd 1 root 20 0 54716 8416 5696 S 0.0 0.2 0:02.41 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:01.21 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 0:16.32 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root 20 0 0 0 0 S 0.0 0.0 0:17.35 rcuos/0 10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/0 11 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 12 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 13 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/1 14 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 15 root 20 0 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/1 17 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H 18 root 20 0 0 0 0 S 0.0 0.0 0:05.64 rcuos/1 19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/1 20 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/2 21 root rt 0 0 0 0 S 0.0 0.0 0:00.01 migration/2 22 root 20 0 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/2 24 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/2:0H 25 root 20 0 0 0 0 S 0.0 0.0 0:05.83 rcuos/2 26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/2 27 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/3 28 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/3 29 root 20 0 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/3 31 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/3:0H 32 root 20 0 0 0 0 S 0.0 0.0 0:06.48 rcuos/3 33 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/3 34 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 khelper 36 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns 37 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 perf 38 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 writeback 39 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd 40 root 39 19 0 0 0 S 0.0 0.0 0:00.00 khugepaged 41 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 crypto 42 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kintegrityd 43 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset ----------------------------- .............................. .............................. ..............................
-
You can set memory allocation in VMWare, in the VM properties go to the resources tab, and click on memory.
You can either reserve all memory, or a set amount.
-
@Wayne-Workman I just got into work today and this morning the fog server was still up and running after the memory upgrade to 6 GBs. I can still do that if you’d like because I would hate to see your hard word go to waste. Let me know if you want me to do that
-
@cml Thank you I saw this right before I upgraded the memory but didn’t know what to do and didn’t want to mess with something I didn’t understand. The memory upgrade seems to have fixed the issue for now and I will keep an eye on it but do you think I should do this anyways?
-
@cml Thank yo ufor the screen shot! I only have Hyper-V & KVM experience, glad someone could give a VMWare screenshot!
@szecca1 Yes please go ahead and setup the crontab event. Because the server is still online today - we know your issue is memory related, and the output from the below scheduled script will help you in the future… Keep in mind - the 6 gigs of RAM may not have fixed it, it may only have lengthened the amount of time between crashes…
-
This is what I am getting when I typed that path in. Am I doing something wrong?
-
@szecca1 The path is
/root/
the filename ismonitor.sh
You create the file. I made it from scratch. You can create it with Vi. Like
vi /root/monitor.sh
I suppose you can put the script anywhere you like.
-
@Wayne-Workman Hey after I am done typing the script how do I save it?