Fog server keeps going down
-
For clarity I run FOG under centos running on ESXi 5.5. I don’t have this kind of issue.
I know this was a long thread and I kind of skimmed it.
When you setup your linux server you should set the network adapter to e1000, you can use vmxnet3 if you install the vmware tools first. Its always has been a pain to do so I just use the e1000 driver. I seem to recall a bug with the e1000 driver with either the GA release of 5.0 or 5.5 that caused the virtual nic to hang under heavy load. That issue was fixed with a ESXi update. Is you ESXi server up to date with patches?
Please help me understand you say you can’t access the FOG server. I understand if you can’t access it via the network, can you log into the console directly using vmware configuration client? Or is the vmclient frozen? If you can log in from the console then I would sure check the logs in /var/logs to see if there is any indication of what happened. I would also try to restart the network services (from inside the fedora OS) to see if you can get it back online. I have see where two hosts having the same IP address will take a server off line like this.
It would be interesting to see what happens to this vm if you shutdown all FOG services at the end of the day. Is the VM still on the network in the AM?
[soapbox] While I understand the techie needs to run fedora, why are you using a desktop OS instead of a server OS? Would you load MS SQL server on Windows 7 for your enterprise application? Please pick stable rhel or centos over fedora. Yes I know it works, but why not use the right tool for the job.[/soapbox]
-
@george1421 said:
why are you using a desktop OS instead of a server OS?
Fedora Server is a server OS. They have three different versions: workstation, server, and cloud.
I would also point out that Windows 7 uses the same kernel as Windows Server 2008 R2.
-
@Wayne-Workman said:
Fedora Server is a server OS. They have three different versions: workstation, server, and cloud.
I would also point out that Windows 7 uses the same kernel as Windows Server 2008 R2.
Well color me red (hat) and call me embarrassed (to a point), mea culpa.
Traditionally the fedora brand was the rapid development OS for redhat with many new/advanced functions. Some of those features would end up in rhel (centos) and some features would just disappear. With the rapid development cycles you may get a new version every few months (which is NOT what you need for a server OS). Some companies have regulations about NOT using outdated or unpatched software (including the OS). If you built your servers based on Fedora (any version of a rapid development OS) you would be in a continual update cycle just to stay current and within policy. That is why rhel, centos, ubuntu lts are all long term supported OS’. To give companies a stable and supported OS platform. Does this mean you shouldn’t use Fedora for a OS, no. A lot depends on your company’s regulations. Just because you CAN install MS SQL server on Win7, doesn’t mean you should. (If I remember MS’ Windows 7 EULA you are not allowed to use a desktop OS in a server role)
-
@george1421 I used the OS recommended by this forum. I was told to use Fedora Server 22 as that was the best option available. I have not had a single problem with it until now besides for a few minor ones. I do not feel the need to reinstall a different OS but on a positive note the FOG server was up and running this morning. I am about to run updates on the SVN and see how everything goes over the weekend. I don’t think this is resolved but for the time being I am up and running without the need to reboot the server.
To answer your questions, when the server does goes down I have no access to it unless I reboot it. I cant remote into it from the interface or from putty or anything. On the computer side, they stop at the fog screen and won’t boot past that unless you tell the computer to boot to the hard drive first.
We are off after today until Monday so I may not be responding until then and I will update everyone on the situation then. I am willing to try anything except to start from scratch as if thats the case I will just reboot the server every morning. Any thoughts please let me know! -
@szecca1 No worries, this is only my narrow view of how the world should work. Reinstalling is not a requirement or even a recommendation. It just makes supporting 8 different OS flavors (times the number of versions per flavor) a bit difficult to have so many to support.
When the server goes down: I may be using a tool you don’t have access to. I can access all of my VMs from a vSphere Client. This is a tool I can run on my desktop computer to access the vmware servers. It allows me to access the console of each VM like if it was a physical server connected to a keyboard and monitor. This is a built in fuction to vSphere so every install has this possibility. Now vmware recommends to the vSphere Web interface, but the vSphere Client still works. Having access to this will tell us if the server/vm is freezing or just the network has going off-line. This console access would also allow us to inspect the event logs before the server is rebooted.
-
@george1421 No worries, I have the same reaction towards other things and there should be one OS that is recommended for here so everyone can be on the same page.
We have vSphere client and my boss was trying to access it yesterday from that and couldn’t. The server was unresponsive bypassing the nic card and using the console to try and access it. Still received no communication. We were also able to ping the server just fine as well during this process. -
@szecca1 are you not able to access the console?
-
Sorry for being difficult here.
I see a contraindication in your statement as I read it.
- You would not able to access the console using the vSphere client (suggesting that the vm client was frozen/hung)
- We were able to ping the server just fine during this process. (tells me while the console was hung, the vm client was pingable ???)
I’m still trying to drive to the root either the server (vm client) is hanging or the virtual NIC is hanging/off line)
-
@george1421 Ok I’ll try to be clear:
- In vSphere, when selecting the FOG server there a console tab. That tab allows you to remote on to the server. Using that console, when the server is down, CAN’T acces the server. All other vm servers in vSphere are fine and accessible.
- I can ping the FOG server at 10.1.0.119 when the server becomes inaccessible. Which leads me to believe that the nic card and the virtual nic card are working fine.
This leads us to have to force the server to reboot using vSphere
Please tell me if this isn’t clear
-
I’m still seeing a conflict here. But lets run with it.
The console is frozen (which might indicate the vm has crashed because the console is isolated from any running application like FOG. Even if something was consuming 100% of the CPU the console should still respond, although slowly). Assuming that the vm client has hung, the network stack is still operational (which is responding to a ping). Based on my experience this is a unique situation that should not be.
Lets see if we can acquire a bit more info. Looking at the ESXi console (not the vSphere Web interface accessed by a browser) when that VM is unresponsive is the vm tools reporting to ESXi correctly (summary tab). Is that VM posting any alerts to ESXi (alarms tab). When the vm is in this lost state what does the Consumed Host CPU value show (Summary tab)? Is it high, low, or about the same as when its working correctly? There has to be some external indication that this server has gone away.
I’m still have the feeling that you might have a machine with the same address out there causing problem. That would explain the FOG server dropping off the network (hanging) but still pingable. Plus this is a new installation and not something that has been in place for a while. All other VMs are running without issue on the same hypervisor, All of this is making me think there is some external source at play here. Understand this is just an attempt to read the tea leaves based on what you’ve said this far.
-
@george1421 Where is this conflict that you’re seeing?
We just turned off the fog server and the IP address is no longer pingable which means there is no conflict with IPs. -
@szecca1 Devices don’t always respond to pings. You can configure windows or linux or OSX to not respond to them. Could be a switch, a UPS, an IP camera, a printer, lots of things.
-
@Wayne-Workman Can you guys trust me when I tell you it is not a duplicate IP address. No server, IP camera, UPS, switch or any of those things has been added to cause this problem. We do not configure any of our devices to be unpingable because that would make the devices less managable. I can assure you this is not a duplicate IP issue
-
@szecca1 I don’t think @george1421 means a “conflict” in the terms of IP Addressing, but rather in logic.
If the FOG Server is not accessible VIA console, but can be pinged something is way off. The console is the only thing that should ALWAYS be accessible regardless of the state of network, services, or anything else. Even if you cannot do anything on the console, you should be able to see the server in it’s funked up state. The fact that the server is pingable, but you cannot access the console is conflicting, not that you have multiple ip addresses, or duplicates, or what have you.
-
@Tom-Elliott said:
@szecca1 I don’t think @george1421 means a “conflict” in the terms of IP Addressing, but rather in logic.
Tom is right on, I was focusing on the logic of what you were saying which lead me to the duplicate IP address conclusion. Maybe I need to choose my words a bit better too.
While I don’t offer to do this very often, but since we have a similar virtualization environment, I can create a VM with FOG running on Centos 6.7. Assuming your boss will allow, I can export the vm and you can upload it to your ESXi platform. The only issue I have is if can I make the VM small enough to get it to you vi my dropbox.
-
@george1421 Google drive…
-
@Tom-Elliott I agree this doesn’t make any sense. I wish I was making this up but in the end, I have no access to the server during the time that it is down until I tell vSphere to just reboot the VM. The console on vSphere is not able to commuincate and basically has the same result as when I try to use putty to remote in.
I can already tell you that my boss will not allow that, although I really do appreciate the offer. In a school district that would be too high of a security risk, unfortunately. Thank you though! -
@szecca1 Is the OS installed with a GUI?
-
-
@Tom-Elliott No Fedora server doesnt have a GUI