NFS problems after upgrade to trunk
-
@george1421 said:
I didn’t read all of the posts here, but could you do a
showmount -e 127.0.0.1
This will show us what you have NFS shared on your FOG server.That’s shows that the exports are proper and they do actually work from the storage node cross site, but a client on the local site can’t connect.
-
@Wayne-Workman said:
@John-Sartoris refused is different than denied… Have you checked for IP conflicts?
I haven’t specifically checked, but this server is configured in the same way as the rest in our environment. DHCP with a reservation. This is the only server that should get this address. Access ports are not configured in this vlan. I have not had any issues connecting to the server expect via NFS. Ping and SSH work from debug client.
-
@John-Sartoris I’m thinking it is probably something network related.
-
I understand and agree with the assessment, but nothing has changed on the LAN in weeks other that the fog server updates. Firewalls are disabled and allowed for good measure. I even just tried added iptables rules without success.
-
@John-Sartoris can you quickly throw together a CentOS 7 VM and install fog trunk to test?
-
@Wayne-Workman I’m in a meeting right now (yeah its a bit boring) so I can’t test. But from a debug (boot) session, is the showmount command installed in FOS? It would be interesting to know from the client perspective if the FOG server is showing its mount information.
[edit] The other thing would be to try to do a manual nfs mount from the FOS client to the FOG server. If it maps then there is a parameter setup incorrectly (somewhere) in the FOG GUI. The mount command would be something like
mount <fog_server_ip>:/images /img
(or what ever the local directory is called on the FOG client) [/edit] -
I’m trying to manually mount from the FOG Client now, and I’m receiving a connection refused.
@Wayne-Workman
Configuring another VM would be possible, but quite a heavy bit of work for what sure seems to be a firewall config or NFS ACL issue that happen during OS or Fog upgrade. -
@John-Sartoris said:
Configuring another VM would be possible, but quite a heavy bit of work for what sure seems to be a firewall config or NFS ACL issue that happen during OS or Fog upgrade.
That’s just the thing though. you’ve disabled UFW, and NFS has no protections. Ubuntu does not come pre-loaded with Security Enhanced Linux, either, like other distributions do https://wiki.ubuntu.com/SELinux
-
Do NFS connections get logged on the server? I’ve tried several suggestions I’ve found but I don’t see any messages for successful or failed connections. I’m sure I’ve seen connection attempts in a log before when testing VMWare ESXi at home, just can’t recall where.
Looks like the export option “no_acl” deals with the file system ACLs not anything network related. And the export is to * so in theory anyone should be able to connect.
-
Just out of curiosity, do the /etc/export files match between the master fog server and the storage node? Can you post the export files here?
Other random thoughts:
Are you running NFSv4 on either of your server? NFSv4 has additional security requirements.
Does the FOG user ID have the same group and user IDs on all fog servers? (this may not be mandatory)
-
As posted earlier, config matches on both servers.
/etc/exports
/images *(ro,sync,no_wdelay,no_subtree_check,insecure_locks,no_root_squash,insecure,fsid=0) /images/dev *(rw,async,no_wdelay,no_subtree_check,no_root_squash,insecure,fsid=1)
nfs-kernel-server 1.2.5-3ubuntu3.2 does appear to be NFSv4, however this is the version running on both servers. It may have been original with the working storage node, while the master may have been upgraded to 3 to 4.
http://packages.ubuntu.com/precise-updates/nfs-kernel-server
also NFS does appear to be reaching the server from the client.
# netstat -t -u -c | grep 10.2.ccc.bbb tcp 0 0 fog-01:ssh 10.2.ccc.bbb:41650 ESTABLISHED tcp 0 0 fog-01:nfs 10.2.ccc.bbb:692 ESTABLISHED
-
Thank you for posting the export file. Understand from my perspective this is a system that was setup by hand (no offense intended). So we must look in every corner.
OK so this is a NFSv4 setup. I know the access controls are greater for NFSv4 over v3. Let me do a little google–fu and see what I can find.
[edit] Just for clarity I’m from the rhel world, so ubuntu is just enough different to be maddening at times. There appears to be two files that should be compared between the working server and the master fog server. /etc/default/nfs-common and /etc/default/nfs-kernel-server these hold the settings for the nfs server. Also ensure that the rpc.idmapd process is running [/edit]
-
I’ve tried to disable NFSv4 as per http://andy.delcambre.com/2007/06/25/disabling-nfsv4-on-ubuntu.html and the comments in “/etc/default/nfs-kernel-server” however the problem still exists.
Just wanted to say I really appreciate all the help you have both been.
I’m out for the day. I’ll pick this up again in the morning.
-
@John-Sartoris I exhausted what I know about it.
If I were in your shoes, I’d rebuild the server. This thread has been open for 8+ hours when a server rebuild would fix it in under 4. It’s just my opinion, but again if it were me, I’d cut my losses and rebuild.
-
While I’m almost where Wayne is, I think you should compare these files /etc/default/nfs-common and /etc/default/nfs-kernel-server between your working storage node and your troublesome master server. Since you think the storage node was already running nfsv4 someone may have already adjusted this server properly. You are so close now I’d hate for you to give up.
Right now you have the luxury of having one system that does work and one that doesn’t. You just need to find the differences in the setup.
-
I understand what you are telling me about rebuilding. Any reason I should switch from Ubuntu to CentOS like you suggest? Ever since I switched from Gentoo, probably 10 years ago, I’ve been using Ubuntu. It’s what I know, I could probably pick up the the intricacies of CentOS vs Ubuntu easily enough, but if there is no reason then why?
As for my current state, the storage node stopped working, and I haven’t been able to get it back. I’ve upgraded both to Ubuntu 14.04, and after a Grub related issues on the master node both are back to the same state for NFS. I have however verified that I have NFSv4 disable and am working only in v3. I found the log file showing connections to rpc.mountd.
/var/log/syslog shows a successful connection being made.
Feb 9 13:30:42 lk-fog-01 rpc.mountd[1360]: authenticated mount request from 10.2.ccc.bbb:911 for /images (/images)
reports just the same as the storage node which completes and can browse without issue.
Feb 9 13:35:27 lk-fog-01 rpc.mountd[1360]: authenticated unmount request from 10.1.yyy.xxx:895 for /images (/images)
-
@John-Sartoris There is nothing wrong with ubuntu. Working with it is sometimes harder and then sometimes easier then RHEL. Its all perspective. I would not change unless you have to choice.
Have you confirmed that the rpc.idmapd process is running on your FOG master server?
Thinking a bit more, root is not able to nfs mount between the servers (I would assume this is the same if you tried to nfs mount from the fog master server to the images folder on the storage node (i.e. just remove the FOS client out of the picture).
-
Well, it’s working. I’m not entirely sure what happened, but I can tell you what I noticed that made me try the full deploy again was the “fog.mount” command in debug mode. I was looking at all the commands available looking for ideas to try and I saw that. I tried it and it completed back to a prompt, then “mount” listed /images as connected. I still was unable to mount manually but…
One thing I did do was reset the fog user password on the master server. I tried to download a kernel ( i’m not even using it) and received a complaint that the ftp password was wrong and it should be a long crypted string. Then when trying to create a deploy task the ftp password was again wrong, and it should be the normal short password I started with. I haven’t tried to download another kernel to test for the error.
Edit - Correction, the master node is working, but the storage node still doesn’t mount…
From debug mode mount is returning “operation not supported” to storage node:/Images
-
@John-Sartoris said:
Edit - Correction, the master node is working, but the storage node still doesn’t mount…
Am I mistaken, or was the problem exactly the opposite before?
-
I did have a problem where when I upgrade from 1.2.0 to one of the trunk releases a while back I had an error so the installer aborted. I fixed the issue and reran the installer, this is how I got the fog user (in the OS) out of sync with the database. On the second run of the installer, the installer didn’t touch the password saved in the database, but it did update the the fog (linux) user password to something new.
I ended up looking in the database at the storage node to get the fog password then set the linux fog user to the same password. That got things back in sync for me. But I first discovered that when I uploaded an image, the ftp process moves it from /image/dev to the right location. I saw ftp login errors that told me where the issue was with my case.
For your fog storage node, you might want to resync what the database says the password should be vs what the OS thinks the fog password should be.
Just saying, this random its working / no its not is very difficult to track down. You are never sure what actually fixed the problem