Kernal panic - not syncing: Attempted to kill init!



  • Server
    • FOG Version: 1.3.0-RC-26
    • OS: Centos 7
    Client
    • Service Version:
    • OS: Win 7
    Description

    So I go to check on my FOG server, cause it’s been a while, and, after updating everything, I run an image on a test machine to make sure it’s still working. I get Different screens each time I reboot. See below examples.

    0_1480450311432_1123161627.jpg
    0_1480450324674_1123161634.jpg
    0_1480450889325_1129161115.jpg
    0_1480450895001_1129161117.jpg
    0_1480457327324_1129161607.jpg

    Haven’t been able to find anything that helps me on the forum or net.



  • OK, I fixed the Apache/HTTPD issue. After much, much, much searching I found this site for a fix and it worked for me.
    http://awsadminz.com/httpd-service-main-process-exited-kill-cannot-find-process/
    Should I post a separate tutorial as a fix for this issue?
    As for the kernel thing, I tried a different PC and it is working fine. I guess I’ll have to figure that one out.



  • This post is deleted!


  • File system still seems fine:

    Filesystem                     Type     Inodes IUsed IFree IUse%  Size  Used Avail Use% File Mounted on
    /dev/mapper/centos00-root00    ext4       1.3M  164K  1.1M   13%   20G  7.7G   11G  42% -    /
    devtmpfs                       devtmpfs   470K   494  469K    1%  1.9G     0  1.9G   0% -    /dev
    tmpfs                          tmpfs      473K    10  473K    1%  1.9G  5.3M  1.9G   1% -    /dev/shm
    tmpfs                          tmpfs      473K   645  473K    1%  1.9G   17M  1.9G   1% -    /run
    tmpfs                          tmpfs      473K    13  473K    1%  1.9G     0  1.9G   0% -    /sys/fs/cgroup
    /dev/sda5                      ext4        63K   365   63K    1%  969M  329M  574M  37% -    /boot
    /dev/mapper/fog-opt_fog_images ext4        26M   10K   26M    1%  395G   80G  295G  22% -    /opt
    /dev/sdb1                      ext4       261M    15  261M    1%  8.1T   91M  7.7T   1% -    /images
    tmpfs                          tmpfs      473K    30  473K    1%  379M   16K  379M   1% -    /run/user/1000
    

  • Senior Developer

    @ManofValor that and httpd has absolutely nothing with DHCP, TFTP, or the fos system with kernel panics.


  • Senior Developer

    @ManofValor I’m going to guess that your main filesystem (which doesn’t appear to be a problem) is full again. I say that because all was working, then it wasn’t.



  • I believe httpd is the problem right now. I tried to go into FOG management and got an "Unable to connect"page. So I tried to run the installer and apache2 failed to start. I’ve looked at a lot of sites with this similar issue and cannot determine a solution for me. I’ve started, restarted, stopped, reloaded, kill, and tried to update apache/httpd. Here is some info that I hope can help.

    [root@localhost httpd]# service httpd start
    Redirecting to /bin/systemctl start  httpd.service
    Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details.
    [root@localhost httpd]# systemctl status httpd.service
    ● httpd.service - The Apache HTTP Server
       Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
       Active: failed (Result: exit-code) since Thu 2016-12-01 15:08:00 CST; 30s ago
         Docs: man:httpd(8)
               man:apachectl(8)
      Process: 13974 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=1/FAILURE)
      Process: 13959 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE)
     Main PID: 13959 (code=exited, status=1/FAILURE)
    
    Dec 01 15:08:00 localhost.localdomain systemd[1]: Starting The Apache HTTP Server...
    Dec 01 15:08:00 localhost.localdomain httpd[13959]: AH00558: httpd: Could not reliably determine the server's fully qu...ssage
    Dec 01 15:08:00 localhost.localdomain systemd[1]: httpd.service: main process exited, code=exited, status=1/FAILURE
    Dec 01 15:08:00 localhost.localdomain kill[13974]: kill: cannot find process ""
    Dec 01 15:08:00 localhost.localdomain systemd[1]: httpd.service: control process exited, code=exited status=1
    Dec 01 15:08:00 localhost.localdomain systemd[1]: Failed to start The Apache HTTP Server.
    Dec 01 15:08:00 localhost.localdomain systemd[1]: Unit httpd.service entered failed state.
    Dec 01 15:08:00 localhost.localdomain systemd[1]: httpd.service failed.
    Hint: Some lines were ellipsized, use -l to show in full.
    [root@localhost httpd]# journalctl -xn
    -- Logs begin at Wed 2016-11-30 13:17:57 CST, end at Thu 2016-12-01 15:08:31 CST. --
    Dec 01 15:08:00 localhost.localdomain kill[13974]: kill: cannot find process ""
    Dec 01 15:08:00 localhost.localdomain systemd[1]: httpd.service: control process exited, code=exited status=1
    Dec 01 15:08:00 localhost.localdomain systemd[1]: Failed to start The Apache HTTP Server.
    -- Subject: Unit httpd.service has failed
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    -- 
    -- Unit httpd.service has failed.
    -- 
    -- The result is failed.
    Dec 01 15:08:00 localhost.localdomain systemd[1]: Unit httpd.service entered failed state.
    Dec 01 15:08:00 localhost.localdomain systemd[1]: httpd.service failed.
    Dec 01 15:08:01 localhost.localdomain dhcpd[2890]: DHCPDISCOVER from 36:02:86:28:b1:8b (MCWPL53) via enp30s0
    Dec 01 15:08:01 localhost.localdomain dhcpd[2890]: DHCPOFFER on 10.10.0.10 to 36:02:86:28:b1:8b (MCWPL53) via enp30s0
    Dec 01 15:08:01 localhost.localdomain polkitd[777]: Unregistered Authentication Agent for unix-process:13944:9300666 (system b
    Dec 01 15:08:28 localhost.localdomain dhcpd[2890]: DHCPINFORM from 10.10.1.167 via enp30s0: not authoritative for subnet 10.10
    Dec 01 15:08:31 localhost.localdomain dhcpd[2890]: DHCPINFORM from 10.10.1.130 via enp30s0: not authoritative for subnet 10.10
    

    I just don’t understand it enough to completely know what I’m looking at.



  • I started thinking the same thing. I’m going to try a different PC and see what happens.


  • Senior Developer

    Seeing as the error that you’re seeing states “Connection reset” that could mean any number of things, though I wouldn’t know where to begin looking.

    Maybe you have port security turned on? I don’t know. If that’s the problem, now, it would seem to indicate (probably) a problem with the specific system you’re running into these issues with.

    First it was ram, now the nic won’t maintain connection, etc…???


  • Senior Developer

    @ManofValor Cable being fine doesn’t tell us much. (You pulled it and put in a new one just in case?)

    Where are you getting the ipxe error from? The error from iPXE is not a kernel panic.



  • Well, that didn’t work. The cable is fine. I get internet and I can ping it.



  • @Tom-Elliott
    I just tried it cause I went to https://ipxe.org/0f0a6039, from the screen shot, and that was one of the suggestions.
    I’ll try that and see…


  • Senior Developer

    @Tom-Elliott Sorry for redhat/centos/fedora it appears:

    yum -y install xz-devel
    ldconfig
    

    Should do the trick (no guarantees) though I don’t know why you need to download the ipxe repository.


  • Senior Developer

    @ManofValor No, it didn’t. That looks you downloaded the ipxe repository then ran:

    make

    AFter that, as it it was building, you received that information.

    LZMA, I would suspect, would be you needing the lzma-devel packages (not sure where but I think it’s xzutils-devel and xzutils from fedora/redhat based.)



  • Not sure if this was the right thing to try but I ran:

    git clone git://git.ipxe.org/ipxe.git
    

    And it ended with this:

      [AR] bin/blib.a
    ar: creating bin/blib.a
      [HOSTCC] util/zbin
    util/zbin.c:7:18: fatal error: lzma.h: No such file or directory
     #include <lzma.h>
                      ^
    compilation terminated.
    make: *** [util/zbin] Error 1
    

  • Senior Developer

    @ManofValor Then, based on the fact that it seems to be pointing at network, I’d say check your patch cable and make sure the connection is solid.



  • @ManofValor
    By the way, this screen is consistent now.



  • @Tom-Elliott
    So I checked the partition size’s:

    [root@localhost fogadmin]# df -h --o
    Filesystem                     Type     Inodes IUsed IFree IUse%  Size  Used Avail Use% File Mounted on
    /dev/mapper/centos00-root00    ext4       1.3M  158K  1.1M   13%   20G  7.5G   12G  41% -    /
    devtmpfs                       devtmpfs   470K   494  469K    1%  1.9G     0  1.9G   0% -    /dev
    tmpfs                          tmpfs      473K     9  473K    1%  1.9G  4.5M  1.9G   1% -    /dev/shm
    tmpfs                          tmpfs      473K   643  473K    1%  1.9G  8.9M  1.9G   1% -    /run
    tmpfs                          tmpfs      473K    13  473K    1%  1.9G     0  1.9G   0% -    /sys/fs/cgroup
    /dev/sda5                      ext4        63K   365   63K    1%  969M  329M  574M  37% -    /boot
    /dev/mapper/fog-opt_fog_images ext4        26M   10K   26M    1%  395G   80G  295G  22% -    /opt
    /dev/sdb1                      ext4       261M    15  261M    1%  8.1T   91M  7.7T   1% -    /images
    tmpfs                          tmpfs      473K    29  473K    1%  379M   12K  379M   1% -    /run/user/1000
    

    I re-seated the memory and now I’m getting this new screen:
    0_1480538421087_1130161420.jpg

    Should I start a new post for this one?


  • Moderator

    @Tom-Elliott said in Kernal panic - not syncing: Attempted to kill init!:

    Seeing as I know, at some point, your disk space has been full, I’d say start there

    Please ask us about how to partition your disks to avoid a full root partition.


  • Senior Developer

    More often than not, this is presented if you’ve crossed init’s and kernels (32 bit init with 64 bit kernel or vice versa).

    That said, based on the “inode referenced” issue this could be caused by a few factors.

    1. If you edited the inits and failed to compress correctly.
    2. If the init was not fully downloaded.
    3. Memory on the device is bad/ going bad.
    4. Memory on the server is bad/ going bad.
    5. Diskspace is full causing the file to fail to download fully.
    6. HDD going bad on the server causing the file to be sent in a corrupted state (at random).

    Of course, I’m sure, there’s more potential causes, but start with the basics first.

    Seeing as I know, at some point, your disk space has been full, I’d say start there (though this seems unlikely if the panic is always different.)
    If this all checks out, try reseating the hosts ram (maybe try known good ram too?).

    There’s more to try out first, but please just start here and see what you might find.


Log in to reply
 

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.