Fog/Linux cant image, nfs and rpcbind issues
-
No, I dont see much that is any help to me, I have tried Googling most all of these errors
Jun 5 13:39:55 fog systemd: Stopped NFS server and services. Jun 5 13:39:55 fog systemd: Stopping NFS Mount Daemon... Jun 5 13:39:55 fog systemd: Stopping NFSv4 ID-name mapping service... Jun 5 13:39:55 fog systemd: Stopped NFSv4 ID-name mapping service. Jun 5 13:39:55 fog rpc.mountd[16751]: Caught signal 15, un-registering and exiting. Jun 5 13:39:55 fog systemd: Stopped NFS Mount Daemon. Jun 5 13:39:57 fog systemd: Starting Preprocess NFS configuration... Jun 5 13:39:57 fog systemd: Started Preprocess NFS configuration. Jun 5 13:39:57 fog systemd: Starting NFSv4 ID-name mapping service... Jun 5 13:39:57 fog systemd: Starting NFS Mount Daemon... Jun 5 13:39:57 fog systemd: Started NFSv4 ID-name mapping service. Jun 5 13:39:57 fog rpc.mountd[22526]: Version 1.3.0 starting Jun 5 13:39:57 fog systemd: Started NFS Mount Daemon. Jun 5 13:39:57 fog systemd: Starting NFS server and services... Jun 5 13:39:57 fog kernel: NFSD: starting 90-second grace period (net ffffffff81aa0e80) Jun 5 13:39:57 fog systemd: Started NFS server and services. Jun 5 13:39:57 fog systemd: Starting Notify NFS peers of a restart... Jun 5 13:39:57 fog sm-notify[22545]: Version 1.3.0 starting Jun 5 13:39:57 fog sm-notify[22545]: Already notifying clients; Exiting! Jun 5 13:39:57 fog systemd: Started Notify NFS peers of a restart. Jun 5 13:42:50 fog systemd: rpcbind.service: main process exited, code=killed, status=6/ABRT Jun 5 13:42:50 fog systemd: Unit rpcbind.service entered failed state. Jun 5 13:42:50 fog systemd: rpcbind.service failed. Jun 5 13:43:29 fog xinetd[19177]: START: tftp pid=23930 from=172.20.33.226 Jun 5 13:43:29 fog in.tftpd[23931]: tftp: client does not accept options Jun 5 13:43:29 fog in.tftpd[23932]: Client 172.20.33.226 finished undionly.kpxe Jun 5 13:43:36 fog in.tftpd[23935]: Client 172.20.33.226 finished default.ipxe Jun 5 13:43:46 fog systemd: Starting RPC bind service... Jun 5 13:43:46 fog systemd: Started RPC bind service. Jun 5 13:45:37 fog in.tftpd[24294]: tftp: client does not accept options Jun 5 13:45:37 fog in.tftpd[24295]: Client 172.20.36.5 finished undionly.kpxe Jun 5 13:45:44 fog in.tftpd[24315]: Client 172.20.37.161 finished default.ipxe Jun 5 13:57:50 fog systemd: rpcbind.service: main process exited, code=killed, status=6/ABRT Jun 5 13:57:50 fog systemd: Unit rpcbind.service entered failed state. Jun 5 13:57:50 fog systemd: rpcbind.service failed. Jun 5 13:58:08 fog systemd: Starting RPC bind service... Jun 5 13:58:08 fog systemd: Started RPC bind service. Jun 5 14:00:21 fog in.tftpd[28625]: tftp: client does not accept options Jun 5 14:00:21 fog in.tftpd[28626]: Client 172.20.33.226 finished undionly.kpxe Jun 5 14:00:28 fog in.tftpd[28671]: Client 172.20.33.226 finished default.ipxe Jun 5 14:01:01 fog systemd: Started Session 3 of user root. Jun 5 14:01:01 fog systemd: Starting Session 3 of user root. Jun 5 14:05:07 fog yum[29343]: Installed: perl-Encode-Detect-1.01-13.el7.x86_64 Jun 5 14:05:07 fog yum[29343]: Installed: perl-IO-Tty-1.10-11.el7.x86_64 Jun 5 14:05:28 fog kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team Jun 5 14:12:50 fog systemd: rpcbind.service: main process exited, code=killed, status=6/ABRT Jun 5 14:12:50 fog systemd: Unit rpcbind.service entered failed state. Jun 5 14:12:50 fog systemd: rpcbind.service failed.
-
Sorry its not the the NFS process its the rpcbind process, NFS stays active until I restart it, which will fail if I dont restart rpcbind first.
-
So for fun or insanity, i downgraded rpcbind.x86_64 0:0.2.0-13.el6_9 to rpcbind.x86_64 0:0.2.0-13.el6 on the CentOS 6 box I am local to and I think that may have fixed the issue. The image is currently uploading in debug mode, going to try pushing it out as soon as its done.
The other thing I haven’t ruled out is having open-vm-tools installed. I dont know why this would break things but I read something somewhere on some forum about it.
I did yum downgrade rpcbind on the CentOS 7 box and the services are still running, but the local tech hasnt tried imaging again yet.
Resolving Dependencies --> Running transaction check ---> Package rpcbind.x86_64 0:0.2.0-38.el7 will be a downgrade ---> Package rpcbind.x86_64 0:0.2.0-38.el7_3 will be erased --> Finished Dependency Resolution
EDIT: it appears this process fixed the CentOS 6 box, it was able to deploy the same image after an immediate capture of it.
-
@neodawg Sorry I was working on another issue. That is very strange.
If I get a chance to night I’ll spin up a new FOG server with a full upgrade. Its possible something was pushed out over the weekend causing nfs to fail.
-
@neodawg Do you have ipv6 disabled on this box?
ref: http://www.linuxquestions.org/questions/linux-software-2/cenos-7-3-rpcbind-4175596401/ -
I can’t duplicate your issue. Understand I’m not saying that you don’t have an issue. I just can’t duplicate it.
I setup a new 1.4.2 fog server on a fresh install of Centos 7. NFS started as it should.
Details of the build
VM build on ESXi 6.5
Centos 7 x64 v1611I installed centos minimal
yum upgrade -y
set selinux permissive
systemctl disable firewalld
reboot
git clone https://github.com/FOGProject/fogproject.git /opt/fogproject
cd /opt/fogproject/bin
./installfog.sh
(installed completed without issue)Installed version of rpcbind: rpcbind-0.2.0-38.el7_3.x86_64
Installed version of open-vm-tools: open-vm-tools-10.0.5-4.el7_3.x86_64Even after a reboot nfs and rpcbind services are still happy.
-
This sounds similar… https://bugzilla.redhat.com/show_bug.cgi?id=1457963
@george1421 Maybe you need to have a client mount the share to run into the same issue?!?
-
@Sebastian-Roth and @george1421
Yes, the service will start and run, but shortly after a client tries to connect the service will die.
The link you listed sound exactly what is happening, however i didn’t test different versions of NFS protocol, ie v3 vs v4.
On one of the fog servers the tech was able to actually image one computer successfully and then the next computer it failed on, because the rpcbind service had died.
-
I heard back from the local tech with the CentOS 7 box and downgrading rpcbind fixed the issue on that server as well. I guess we will have to wait until RedHat/CentOS fixes the rpcbind package
-
@george1421 I just found some more reports, e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1448124
Seems like they are working on it. Let’s hope they are able to fix this fairly soon! See here: https://bugzilla.redhat.com/show_bug.cgi?id=1457172 -
@Sebastian-Roth Great find!! I still have that test system setup. I’ll update dhcp and attempt to image a VM a system this morning. There was no time earlier this weeek to confirm once a client tries to image nfs fails. I will have a few minutes today to test. Just thinking if its a memory leak it issue may not show up right away. I guess we will find out.
-