Fog server keeps going down
-
@need2 I just rebooted and it works fine again but this is an every day event and causes issues if computers reboot due to windows updates.
-
@Scott-B said:
What do the logs on the server suggest the issue could be?
Please provide some details of the server. Ubuntu? Fedora? Versions? Do the log files on the server point to anything?
@szecca1 What OS? Also, you can look at Apache Errors via CLI as well if the web interface is not working. If we know the OS, we can give you the exact command to do this (along with a lot of other logs to check too).
-
@Wayne-Workman The OS of the fog server is Fedora Server 22
-
@szecca1 Just saw that you already said that … sorry. I’ll get some commands together to send your way.
When we have a mystery problem, we need to look in a lot of places until we find a clue.
-
Run these commands the next time the server goes down - grab the output, and then just restart it to keep problems in your environment to a minimum. Post what you find and we’ll go from there. If you get errors while posting the output of any of those, just upload the output in a .txt file instead.
the
-n 100
is for number of lines to return - you can adjust that as needed.Apache Error log:
tail -n 100 /var/log/httpd/error_log
MariaDB log in Fedora 22:
tail -n 100 /var/log/mariadb/mariadb.log
Also, run the top command to see what the load averages are, and what is running:
top
Sample output from my server:top - 09:28:32 up 51 days, 20:58, 1 user, load average: 0.08, 0.11, 0.18 Tasks: 166 total, 1 running, 165 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.5 us, 0.2 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 4028388 total, 61580 free, 341508 used, 3625300 buff/cache KiB Swap: 4063228 total, 4053892 free, 9336 used. 3621740 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30282 apache 20 0 409732 24140 13936 S 1.7 0.6 0:10.38 httpd 9 root 20 0 0 0 0 S 0.3 0.0 50:57.07 rcuos/0 672 dbus 20 0 47108 3504 3048 S 0.3 0.1 4:56.67 dbus-daemon 27066 mysql 20 0 2190100 97052 16348 S 0.3 2.4 2:06.60 mysqld 1 root 20 0 187052 6152 3000 S 0.0 0.2 8:43.03 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:01.77 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 1:12.82 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 71:43.84 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/0 11 root rt 0 0 0 0 S 0.0 0.0 0:11.47 migration/0 12 root rt 0 0 0 0 S 0.0 0.0 0:29.96 watchdog/0 13 root rt 0 0 0 0 S 0.0 0.0 0:28.28 watchdog/1 14 root rt 0 0 0 0 S 0.0 0.0 0:14.37 migration/1 15 root 20 0 0 0 0 S 0.0 0.0 0:20.91 ksoftirqd/1 17 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H 18 root 20 0 0 0 0 S 0.0 0.0 6:21.82 rcuos/1 19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/1 20 root rt 0 0 0 0 S 0.0 0.0 0:28.58 watchdog/2 21 root rt 0 0 0 0 S 0.0 0.0 0:12.49 migration/2 22 root 20 0 0 0 0 S 0.0 0.0 2:10.35 ksoftirqd/2 24 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/2:0H 25 root 20 0 0 0 0 S 0.0 0.0 11:36.20 rcuos/2 26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/2 27 root rt 0 0 0 0 S 0.0 0.0 0:27.90 watchdog/3 28 root rt 0 0 0 0 S 0.0 0.0 0:14.79 migration/3 29 root 20 0 0 0 0 S 0.0 0.0 0:10.40 ksoftirqd/3 31 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/3:0H 32 root 20 0 0 0 0 S 0.0 0.0 5:01.15 rcuos/3 33 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/3 34 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 khelper 35 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs 36 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns 37 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 perf 38 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 writeback 39 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd 40 root 39 19 0 0 0 S 0.0 0.0 0:00.00 khugepaged 41 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 crypto 42 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kintegrityd 43 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset 44 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kblockd 45 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 ata_sff 46 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 md 47 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 devfreq_wq
-
@Wayne-Workman How can I run those commands when I can’t connect to the server. Either from the virtual machine or from putty, I can’t log in to the server until after I reboot it!
And no worries, I know you guys are busy with other things. Any information you need I’ll be happy to give you several times if needed. -
@szecca1 that is definitely information we needed. That leads me to believe either a problem with the nic that is associated. Does your server go to sleep/hibernate?
-
@Tom-Elliott No the server doesn’t go to sleep. The other VM’s on that server are working perfectly fine. Its just FOG that seems to be having a problem. But once it is rebooted, it comes up and works fine for the day.
-
@szecca1 When I read that post below I immediately thought exactly what Tom posted - NIC issues. Maybe even switch issues.
Try a different port on the switch, maybe even try a different switch… Maybe try a different type of virtual adapter.
-
@Wayne-Workman No other VMs attached to this device is having issues. It could be the virtual adapter but I can change that? What you guys are saying makes sense but it has to be something with the FOG machine because the others are working fine.
-
@szecca1 We just checked to see if we could add a new driver for the ethernet card and we couldn’t. Do you recommend a certain ethernet card adapter driver for me to use? We currently have the E1000 ethernet adapter on it.
-
@szecca1 I don’t think it’s the driver. I think it’s the physical adapter.
-
@Tom-Elliott It can’t be the physical adapter. The adapter that is being used is for several different VMs and no other VMs are having issues besides the FOG server.
-
@szecca1 could it be possible to separate the nic for the fog server? Remember fog is using the nic for many things. Separating the nic would be the first suggestion u can make. I understand the issue is only occurring on the fog service.
-
@Tom-Elliott If I may ask, what do you mean by separating the nic?
-
@szecca1 put fog on a separate physical nic. The fact that you can change the type of the nic to another adapter type tells me your fog server is a part of a vm. This usually means the vm host has multiple physical nics available, if this is not the case the only other option I can give would be to change the driver which I doubt is going to help.
-
@Tom-Elliott alright i will give that a look to see if thats possible.
-
@szecca1 What are you using for virtualization? If it provides an interface (like windows Hyper-v or CentOS KVM), then can you get to the fog server that way?
-
For clarity I run FOG under centos running on ESXi 5.5. I don’t have this kind of issue.
I know this was a long thread and I kind of skimmed it.
When you setup your linux server you should set the network adapter to e1000, you can use vmxnet3 if you install the vmware tools first. Its always has been a pain to do so I just use the e1000 driver. I seem to recall a bug with the e1000 driver with either the GA release of 5.0 or 5.5 that caused the virtual nic to hang under heavy load. That issue was fixed with a ESXi update. Is you ESXi server up to date with patches?
Please help me understand you say you can’t access the FOG server. I understand if you can’t access it via the network, can you log into the console directly using vmware configuration client? Or is the vmclient frozen? If you can log in from the console then I would sure check the logs in /var/logs to see if there is any indication of what happened. I would also try to restart the network services (from inside the fedora OS) to see if you can get it back online. I have see where two hosts having the same IP address will take a server off line like this.
It would be interesting to see what happens to this vm if you shutdown all FOG services at the end of the day. Is the VM still on the network in the AM?
[soapbox] While I understand the techie needs to run fedora, why are you using a desktop OS instead of a server OS? Would you load MS SQL server on Windows 7 for your enterprise application? Please pick stable rhel or centos over fedora. Yes I know it works, but why not use the right tool for the job.[/soapbox]
-
@george1421 said:
why are you using a desktop OS instead of a server OS?
Fedora Server is a server OS. They have three different versions: workstation, server, and cloud.
I would also point out that Windows 7 uses the same kernel as Windows Server 2008 R2.