Fog server keeps going down



  • @Tom-Elliott If I may ask, what do you mean by separating the nic?


  • Senior Developer

    @szecca1 could it be possible to separate the nic for the fog server? Remember fog is using the nic for many things. Separating the nic would be the first suggestion u can make. I understand the issue is only occurring on the fog service.



  • @Tom-Elliott It can’t be the physical adapter. The adapter that is being used is for several different VMs and no other VMs are having issues besides the FOG server.


  • Senior Developer

    @szecca1 I don’t think it’s the driver. I think it’s the physical adapter.



  • @szecca1 We just checked to see if we could add a new driver for the ethernet card and we couldn’t. Do you recommend a certain ethernet card adapter driver for me to use? We currently have the E1000 ethernet adapter on it.



  • @Wayne-Workman No other VMs attached to this device is having issues. It could be the virtual adapter but I can change that? What you guys are saying makes sense but it has to be something with the FOG machine because the others are working fine.



  • @szecca1 When I read that post below I immediately thought exactly what Tom posted - NIC issues. Maybe even switch issues.

    Try a different port on the switch, maybe even try a different switch… Maybe try a different type of virtual adapter.



  • @Tom-Elliott No the server doesn’t go to sleep. The other VM’s on that server are working perfectly fine. Its just FOG that seems to be having a problem. But once it is rebooted, it comes up and works fine for the day.


  • Senior Developer

    @szecca1 that is definitely information we needed. That leads me to believe either a problem with the nic that is associated. Does your server go to sleep/hibernate?



  • @Wayne-Workman How can I run those commands when I can’t connect to the server. Either from the virtual machine or from putty, I can’t log in to the server until after I reboot it!
    And no worries, I know you guys are busy with other things. Any information you need I’ll be happy to give you several times if needed.



  • @szecca1

    Run these commands the next time the server goes down - grab the output, and then just restart it to keep problems in your environment to a minimum. Post what you find and we’ll go from there. If you get errors while posting the output of any of those, just upload the output in a .txt file instead.

    the -n 100 is for number of lines to return - you can adjust that as needed.

    Apache Error log:
    tail -n 100 /var/log/httpd/error_log

    MariaDB log in Fedora 22:
    tail -n 100 /var/log/mariadb/mariadb.log

    Also, run the top command to see what the load averages are, and what is running:
    top
    Sample output from my server:

    top - 09:28:32 up 51 days, 20:58,  1 user,  load average: 0.08, 0.11, 0.18
    Tasks: 166 total,   1 running, 165 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  0.5 us,  0.2 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem :  4028388 total,    61580 free,   341508 used,  3625300 buff/cache
    KiB Swap:  4063228 total,  4053892 free,     9336 used.  3621740 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    30282 apache    20   0  409732  24140  13936 S   1.7  0.6   0:10.38 httpd
        9 root      20   0       0      0      0 S   0.3  0.0  50:57.07 rcuos/0
      672 dbus      20   0   47108   3504   3048 S   0.3  0.1   4:56.67 dbus-daemon
    27066 mysql     20   0 2190100  97052  16348 S   0.3  2.4   2:06.60 mysqld
        1 root      20   0  187052   6152   3000 S   0.0  0.2   8:43.03 systemd
        2 root      20   0       0      0      0 S   0.0  0.0   0:01.77 kthreadd
        3 root      20   0       0      0      0 S   0.0  0.0   1:12.82 ksoftirqd/0
        5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
        7 root      20   0       0      0      0 S   0.0  0.0  71:43.84 rcu_sched
        8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh
       10 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/0
       11 root      rt   0       0      0      0 S   0.0  0.0   0:11.47 migration/0
       12 root      rt   0       0      0      0 S   0.0  0.0   0:29.96 watchdog/0
       13 root      rt   0       0      0      0 S   0.0  0.0   0:28.28 watchdog/1
       14 root      rt   0       0      0      0 S   0.0  0.0   0:14.37 migration/1
       15 root      20   0       0      0      0 S   0.0  0.0   0:20.91 ksoftirqd/1
       17 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/1:0H
       18 root      20   0       0      0      0 S   0.0  0.0   6:21.82 rcuos/1
       19 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/1
       20 root      rt   0       0      0      0 S   0.0  0.0   0:28.58 watchdog/2
       21 root      rt   0       0      0      0 S   0.0  0.0   0:12.49 migration/2
       22 root      20   0       0      0      0 S   0.0  0.0   2:10.35 ksoftirqd/2
       24 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/2:0H
       25 root      20   0       0      0      0 S   0.0  0.0  11:36.20 rcuos/2
       26 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/2
       27 root      rt   0       0      0      0 S   0.0  0.0   0:27.90 watchdog/3
       28 root      rt   0       0      0      0 S   0.0  0.0   0:14.79 migration/3
       29 root      20   0       0      0      0 S   0.0  0.0   0:10.40 ksoftirqd/3
       31 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/3:0H
       32 root      20   0       0      0      0 S   0.0  0.0   5:01.15 rcuos/3
       33 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/3
       34 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 khelper
       35 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kdevtmpfs
       36 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 netns
       37 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 perf
       38 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 writeback
       39 root      25   5       0      0      0 S   0.0  0.0   0:00.00 ksmd
       40 root      39  19       0      0      0 S   0.0  0.0   0:00.00 khugepaged
       41 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 crypto
       42 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kintegrityd
       43 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 bioset
       44 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kblockd
       45 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 ata_sff
       46 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 md
       47 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 devfreq_wq
    


  • @szecca1 Just saw that you already said that … sorry. I’ll get some commands together to send your way.

    When we have a mystery problem, we need to look in a lot of places until we find a clue.



  • @Wayne-Workman The OS of the fog server is Fedora Server 22



  • @Scott-B said:

    What do the logs on the server suggest the issue could be?

    Please provide some details of the server. Ubuntu? Fedora? Versions? Do the log files on the server point to anything?

    @szecca1 What OS? Also, you can look at Apache Errors via CLI as well if the web interface is not working. If we know the OS, we can give you the exact command to do this (along with a lot of other logs to check too).



  • @need2 I just rebooted and it works fine again but this is an every day event and causes issues if computers reboot due to windows updates.



  • @need2 We have Vsphere 5.5 and we uped the CPUs and are not getting an error anymore but this morning I can not get on to FOG. I have to reboot again.


  • Moderator

    What is your virtualization environment?



  • @Tom-Elliott I am running FOG 5453 and latest version is 1.2.0

    It may not be a FOG issue but this is on a virtual machine and the error we are seeing is CPU utilization. We are about to up the utilization to see if that does the trick but wanted to see if you guys saw this before


  • Senior Developer

    What version of fog are you running?

    Are you sure this is because of fog and not a more serious issue (i.e. memory issues, hdd dying, etc…)?



  • @Tom-Elliott I go to that screen and there is nothing under file. Am I supposed to type something here?


Log in to reply
 

215
Online

7.2k
Users

14.4k
Topics

135.6k
Posts