Fog/Linux cant image, nfs and rpcbind issues



  • Server
    • FOG Version: 1.4.2
    • OS: CentOS 6 and 7
    Client
    • Service Version: latest
    • OS: Windows 10
    Description

    Having an issue where I cant upload or download any images. I have narrowed down the issue to being that rpcbind service keeps crashing on the OS, doesnt matter if its CentOS 7 or 6. the issue is the same.

    I have dug in the logs and cant find much or things i have found dont seem to fix the issue.
    The service dies on the server with a very generic error:

    CentOS 6 = rpcbind dead but pid file exists
    
    CentOS 7 = ● rpcbind.service - RPC bind service
       Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; indirect; vendor preset: enabled)
       Active: failed (Result: signal) since Mon 2017-06-05 14:27:50 CDT; 1h 5min ago
      Process: 1138 ExecStart=/sbin/rpcbind -w $RPCBIND_ARGS (code=exited, status=0/SUCCESS)
     Main PID: 1139 (code=killed, signal=ABRT)
    
    Jun 05 14:15:23 fog. systemd[1]: Starting RPC bind service...
    Jun 05 14:15:23 fog. systemd[1]: Started RPC bind service.
    Jun 05 14:27:50 fog. systemd[1]: rpcbind.service: main process exited, code=killed, status=6/ABRT
    Jun 05 14:27:50 fog. systemd[1]: Unit rpcbind.service entered failed state.
    Jun 05 14:27:50 fog. systemd[1]: rpcbind.service failed.
    
    

    I even went as far as to reformat the server and setup fog from scratch again, I have several servers out there and I think I have ruled the issue down to something from yum update that is breaking fog or something, I assume it something with nfs server or rpcbind.

    These are the errors i see on the client, this is in a capture-upload mode, doesn’t matter if its upload or deploy, errors still the same.

    This is what i see on the server:

    root@fog ~]# service rpcbind restart
    Stopping rpcbind:                                          [FAILED]
    Starting rpcbind:                                          [  OK  ]
    [root@fog ~]# service rpcbind status
    rpcbind (pid  8011) is running...
    [root@fog ~]# service rpcbind status
    rpcbind (pid  8011) is running...
    

    After a little while and some trial and error on fog clients i get the errors listed above.

    Here is a pic of the client side: picture


  • Developer

    Checking the CentOS repo just now I saw a new package being available since 13th of June - search for rpcbind here. Anyone keen to test? @neodawg


  • Moderator

    @Sebastian-Roth Great find!! I still have that test system setup. I’ll update dhcp and attempt to image a VM a system this morning. There was no time earlier this weeek to confirm once a client tries to image nfs fails. I will have a few minutes today to test. Just thinking if its a memory leak it issue may not show up right away. I guess we will find out.


  • Developer

    @george1421 I just found some more reports, e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1448124
    Seems like they are working on it. Let’s hope they are able to fix this fairly soon! See here: https://bugzilla.redhat.com/show_bug.cgi?id=1457172



  • I heard back from the local tech with the CentOS 7 box and downgrading rpcbind fixed the issue on that server as well. I guess we will have to wait until RedHat/CentOS fixes the rpcbind package



  • @Sebastian-Roth and @george1421

    Yes, the service will start and run, but shortly after a client tries to connect the service will die.

    The link you listed sound exactly what is happening, however i didn’t test different versions of NFS protocol, ie v3 vs v4.

    On one of the fog servers the tech was able to actually image one computer successfully and then the next computer it failed on, because the rpcbind service had died.


  • Developer

    This sounds similar… https://bugzilla.redhat.com/show_bug.cgi?id=1457963

    @george1421 Maybe you need to have a client mount the share to run into the same issue?!?


  • Moderator

    I can’t duplicate your issue. Understand I’m not saying that you don’t have an issue. I just can’t duplicate it.

    I setup a new 1.4.2 fog server on a fresh install of Centos 7. NFS started as it should.

    Details of the build
    VM build on ESXi 6.5
    Centos 7 x64 v1611

    I installed centos minimal
    yum upgrade -y
    set selinux permissive
    systemctl disable firewalld
    reboot
    git clone https://github.com/FOGProject/fogproject.git /opt/fogproject
    cd /opt/fogproject/bin
    ./installfog.sh
    (installed completed without issue)

    Installed version of rpcbind: rpcbind-0.2.0-38.el7_3.x86_64
    Installed version of open-vm-tools: open-vm-tools-10.0.5-4.el7_3.x86_64

    Even after a reboot nfs and rpcbind services are still happy.


  • Moderator


  • Moderator

    @neodawg Sorry I was working on another issue. That is very strange.

    If I get a chance to night I’ll spin up a new FOG server with a full upgrade. Its possible something was pushed out over the weekend causing nfs to fail.



  • So for fun or insanity, i downgraded rpcbind.x86_64 0:0.2.0-13.el6_9 to rpcbind.x86_64 0:0.2.0-13.el6 on the CentOS 6 box I am local to and I think that may have fixed the issue. The image is currently uploading in debug mode, going to try pushing it out as soon as its done.

    The other thing I haven’t ruled out is having open-vm-tools installed. I dont know why this would break things but I read something somewhere on some forum about it.

    I did yum downgrade rpcbind on the CentOS 7 box and the services are still running, but the local tech hasnt tried imaging again yet.

    Resolving Dependencies
    --> Running transaction check
    ---> Package rpcbind.x86_64 0:0.2.0-38.el7 will be a downgrade
    ---> Package rpcbind.x86_64 0:0.2.0-38.el7_3 will be erased
    --> Finished Dependency Resolution
    

    EDIT: it appears this process fixed the CentOS 6 box, it was able to deploy the same image after an immediate capture of it.



  • @george1421

    Sorry its not the the NFS process its the rpcbind process, NFS stays active until I restart it, which will fail if I dont restart rpcbind first.



  • @george1421

    No, I dont see much that is any help to me, I have tried Googling most all of these errors

    Jun  5 13:39:55 fog systemd: Stopped NFS server and services.
    Jun  5 13:39:55 fog systemd: Stopping NFS Mount Daemon...
    Jun  5 13:39:55 fog systemd: Stopping NFSv4 ID-name mapping service...
    Jun  5 13:39:55 fog systemd: Stopped NFSv4 ID-name mapping service.
    Jun  5 13:39:55 fog rpc.mountd[16751]: Caught signal 15, un-registering and exiting.
    Jun  5 13:39:55 fog systemd: Stopped NFS Mount Daemon.
    Jun  5 13:39:57 fog systemd: Starting Preprocess NFS configuration...
    Jun  5 13:39:57 fog systemd: Started Preprocess NFS configuration.
    Jun  5 13:39:57 fog systemd: Starting NFSv4 ID-name mapping service...
    Jun  5 13:39:57 fog systemd: Starting NFS Mount Daemon...
    Jun  5 13:39:57 fog systemd: Started NFSv4 ID-name mapping service.
    Jun  5 13:39:57 fog rpc.mountd[22526]: Version 1.3.0 starting
    Jun  5 13:39:57 fog systemd: Started NFS Mount Daemon.
    Jun  5 13:39:57 fog systemd: Starting NFS server and services...
    Jun  5 13:39:57 fog kernel: NFSD: starting 90-second grace period (net ffffffff81aa0e80)
    Jun  5 13:39:57 fog systemd: Started NFS server and services.
    Jun  5 13:39:57 fog systemd: Starting Notify NFS peers of a restart...
    Jun  5 13:39:57 fog sm-notify[22545]: Version 1.3.0 starting
    Jun  5 13:39:57 fog sm-notify[22545]: Already notifying clients; Exiting!
    Jun  5 13:39:57 fog systemd: Started Notify NFS peers of a restart.
    Jun  5 13:42:50 fog systemd: rpcbind.service: main process exited, code=killed, status=6/ABRT
    Jun  5 13:42:50 fog systemd: Unit rpcbind.service entered failed state.
    Jun  5 13:42:50 fog systemd: rpcbind.service failed.
    Jun  5 13:43:29 fog xinetd[19177]: START: tftp pid=23930 from=172.20.33.226
    Jun  5 13:43:29 fog in.tftpd[23931]: tftp: client does not accept options
    Jun  5 13:43:29 fog in.tftpd[23932]: Client 172.20.33.226 finished undionly.kpxe
    Jun  5 13:43:36 fog in.tftpd[23935]: Client 172.20.33.226 finished default.ipxe
    Jun  5 13:43:46 fog systemd: Starting RPC bind service...
    Jun  5 13:43:46 fog systemd: Started RPC bind service.
    Jun  5 13:45:37 fog in.tftpd[24294]: tftp: client does not accept options
    Jun  5 13:45:37 fog in.tftpd[24295]: Client 172.20.36.5 finished undionly.kpxe
    Jun  5 13:45:44 fog in.tftpd[24315]: Client 172.20.37.161 finished default.ipxe
    Jun  5 13:57:50 fog systemd: rpcbind.service: main process exited, code=killed, status=6/ABRT
    Jun  5 13:57:50 fog systemd: Unit rpcbind.service entered failed state.
    Jun  5 13:57:50 fog systemd: rpcbind.service failed.
    Jun  5 13:58:08 fog systemd: Starting RPC bind service...
    Jun  5 13:58:08 fog systemd: Started RPC bind service.
    Jun  5 14:00:21 fog in.tftpd[28625]: tftp: client does not accept options
    Jun  5 14:00:21 fog in.tftpd[28626]: Client 172.20.33.226 finished undionly.kpxe
    Jun  5 14:00:28 fog in.tftpd[28671]: Client 172.20.33.226 finished default.ipxe
    Jun  5 14:01:01 fog systemd: Started Session 3 of user root.
    Jun  5 14:01:01 fog systemd: Starting Session 3 of user root.
    Jun  5 14:05:07 fog yum[29343]: Installed: perl-Encode-Detect-1.01-13.el7.x86_64
    Jun  5 14:05:07 fog yum[29343]: Installed: perl-IO-Tty-1.10-11.el7.x86_64
    Jun  5 14:05:28 fog kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
    Jun  5 14:12:50 fog systemd: rpcbind.service: main process exited, code=killed, status=6/ABRT
    Jun  5 14:12:50 fog systemd: Unit rpcbind.service entered failed state.
    Jun  5 14:12:50 fog systemd: rpcbind.service failed.
    


  • @george1421

    [root@fog ~]# cat /etc/exports
    /images *(ro,sync,no_wdelay,no_subtree_check,insecure_locks,no_root_squash,insecure,fsid=0)
    /images/dev *(rw,async,no_wdelay,no_subtree_check,no_root_squash,insecure,fsid=1)
    

  • Moderator

    @george1421 Is there anything in /var/logs/messages that might indicate why rpcbind is failing? This is really strange.


  • Moderator

    @neodawg Ok that is the same exact process as I did last week.

    But the nfs service is dieing…

    What does the output of /etc/exports look like?



  • I should mention I also have several other fog servers out there running CentOS 7 and running versions of fog from 1.3.x to 1.4.1 and they are working fine, but I have not done a yum update -y on them recently.



  • Yep, i did the same thing. Cept I did a yum update -y on the server.

    The server i am working on is running on vmware.

    the firewall and selinux and both stopped and disabled, and rebooted.


  • Moderator

    Lets focus on Centos 7.

    I just setup a Centos 7 system last friday starting at DVD and it worked just fine.

    Have you updated centos 7 to the latest updates?

    Have you set selinux to permissive and disabled the firewalld service?


Log in to reply
 

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.