NFS problems after upgrade to trunk

John Sartoris

As posted earlier, config matches on both servers.

/etc/exports

/images *(ro,sync,no_wdelay,no_subtree_check,insecure_locks,no_root_squash,insecure,fsid=0)
/images/dev *(rw,async,no_wdelay,no_subtree_check,no_root_squash,insecure,fsid=1)

nfs-kernel-server 1.2.5-3ubuntu3.2 does appear to be NFSv4, however this is the version running on both servers. It may have been original with the working storage node, while the master may have been upgraded to 3 to 4.

http://packages.ubuntu.com/precise-updates/nfs-kernel-server

also NFS does appear to be reaching the server from the client.

# netstat -t -u -c | grep 10.2.ccc.bbb
tcp        0      0 fog-01:ssh  10.2.ccc.bbb:41650       ESTABLISHED
tcp        0      0 fog-01:nfs  10.2.ccc.bbb:692         ESTABLISHED

george1421

Thank you for posting the export file. Understand from my perspective this is a system that was setup by hand (no offense intended). So we must look in every corner.

OK so this is a NFSv4 setup. I know the access controls are greater for NFSv4 over v3. Let me do a little google–fu and see what I can find.

[edit] Just for clarity I’m from the rhel world, so ubuntu is just enough different to be maddening at times. There appears to be two files that should be compared between the working server and the master fog server. /etc/default/nfs-common and /etc/default/nfs-kernel-server these hold the settings for the nfs server. Also ensure that the rpc.idmapd process is running [/edit]

John Sartoris

@george1421

I’ve tried to disable NFSv4 as per http://andy.delcambre.com/2007/06/25/disabling-nfsv4-on-ubuntu.html and the comments in “/etc/default/nfs-kernel-server” however the problem still exists.

@Wayne-Workman

Just wanted to say I really appreciate all the help you have both been.

I’m out for the day. I’ll pick this up again in the morning.

Wayne Workman

@John-Sartoris I exhausted what I know about it.

If I were in your shoes, I’d rebuild the server. This thread has been open for 8+ hours when a server rebuild would fix it in under 4. It’s just my opinion, but again if it were me, I’d cut my losses and rebuild.

george1421

While I’m almost where Wayne is, I think you should compare these files /etc/default/nfs-common and /etc/default/nfs-kernel-server between your working storage node and your troublesome master server. Since you think the storage node was already running nfsv4 someone may have already adjusted this server properly. You are so close now I’d hate for you to give up.

Right now you have the luxury of having one system that does work and one that doesn’t. You just need to find the differences in the setup.

John Sartoris

@george1421 @Wayne-Workman

I understand what you are telling me about rebuilding. Any reason I should switch from Ubuntu to CentOS like you suggest? Ever since I switched from Gentoo, probably 10 years ago, I’ve been using Ubuntu. It’s what I know, I could probably pick up the the intricacies of CentOS vs Ubuntu easily enough, but if there is no reason then why?

As for my current state, the storage node stopped working, and I haven’t been able to get it back. I’ve upgraded both to Ubuntu 14.04, and after a Grub related issues on the master node both are back to the same state for NFS. I have however verified that I have NFSv4 disable and am working only in v3. I found the log file showing connections to rpc.mountd.

/var/log/syslog shows a successful connection being made.

Feb  9 13:30:42 lk-fog-01 rpc.mountd[1360]: authenticated mount request from 10.2.ccc.bbb:911 for /images (/images)

reports just the same as the storage node which completes and can browse without issue.

Feb  9 13:35:27 lk-fog-01 rpc.mountd[1360]: authenticated unmount request from 10.1.yyy.xxx:895 for /images (/images)

george1421

@John-Sartoris There is nothing wrong with ubuntu. Working with it is sometimes harder and then sometimes easier then RHEL. Its all perspective. I would not change unless you have to choice.

Have you confirmed that the rpc.idmapd process is running on your FOG master server?

Thinking a bit more, root is not able to nfs mount between the servers (I would assume this is the same if you tried to nfs mount from the fog master server to the images folder on the storage node (i.e. just remove the FOS client out of the picture).

John Sartoris

@george1421 @Wayne-Workman

Well, it’s working. I’m not entirely sure what happened, but I can tell you what I noticed that made me try the full deploy again was the “fog.mount” command in debug mode. I was looking at all the commands available looking for ideas to try and I saw that. I tried it and it completed back to a prompt, then “mount” listed /images as connected. I still was unable to mount manually but…

One thing I did do was reset the fog user password on the master server. I tried to download a kernel ( i’m not even using it) and received a complaint that the ftp password was wrong and it should be a long crypted string. Then when trying to create a deploy task the ftp password was again wrong, and it should be the normal short password I started with. I haven’t tried to download another kernel to test for the error.

Edit - Correction, the master node is working, but the storage node still doesn’t mount…

From debug mode mount is returning “operation not supported” to storage node:/Images

Wayne Workman

@John-Sartoris said:

Edit - Correction, the master node is working, but the storage node still doesn’t mount…

Am I mistaken, or was the problem exactly the opposite before?

george1421

I did have a problem where when I upgrade from 1.2.0 to one of the trunk releases a while back I had an error so the installer aborted. I fixed the issue and reran the installer, this is how I got the fog user (in the OS) out of sync with the database. On the second run of the installer, the installer didn’t touch the password saved in the database, but it did update the the fog (linux) user password to something new.

I ended up looking in the database at the storage node to get the fog password then set the linux fog user to the same password. That got things back in sync for me. But I first discovered that when I uploaded an image, the ftp process moves it from /image/dev to the right location. I saw ftp login errors that told me where the issue was with my case.

For your fog storage node, you might want to resync what the database says the password should be vs what the OS thinks the fog password should be.

Just saying, this random its working / no its not is very difficult to track down. You are never sure what actually fixed the problem

John Sartoris

@Wayne-Workman said:

@John-Sartoris said:

Edit - Correction, the master node is working, but the storage node still doesn’t mount…

Am I mistaken, or was the problem exactly the opposite before?

You are half right :), the nodes have swapped, but before was a “connection refused”, not it’s “operation not supported”

John Sartoris

@george1421 @Wayne-Workman

Both nodes are back online now. I had a left over “-T” in /etc/default/nfs-kernel-server on the storage node from my attempts to disable NFSv4. What I was reading said NFSv3 doesn’t use TCP and is only UDP. Well, however fog connects it needs TCP with v3.

Here is where I ended, I don’t know if the RPCNFSCOUNT of 48 is needed to be increased from the default 8, but I read that this increases the threads for nfs workers.

/etc/default/nfs-kernel-server

# Number of servers to start up
# To disable nfsv4 on the server, specify '--no-nfs-version 4' here
RPCNFSDCOUNT='48 --no-nfs-version 4'

# Runtime priority of server (see nice(1))
RPCNFSDPRIORITY=0

# Options for rpc.mountd.
# If you have a port-based firewall, you might want to set up
# a fixed port here using the --port option. For more information,
# see rpc.mountd(8) or http://wiki.debian.org/SecuringNFS
# To disable NFSv4 on the server, specify '--no-nfs-version 4' here
RPCMOUNTDOPTS='--manage-gids --no-nfs-version 4'

# Do you want to start the svcgssd daemon? It is only required for Kerberos
# exports. Valid alternatives are "yes" and "no"; the default is "no".
NEED_SVCGSSD="no"

# Options for rpc.svcgssd.
RPCSVCGSSDOPTS=""

# Options for rpc.nfsd.
RPCNFSDOPTS=""

/etc/idmapd.conf

[General]

Verbosity = 0
Pipefs-Directory = /run/rpc_pipefs
# set your own domain here, if id differs from FQDN minus hostname
 Domain = localdomain

[Mapping]

Nobody-User = nobody
Nobody-Group = nogroup

[Translation]
Method = nsswitch

/etc/default/nfs-common

# If you do not set values for the NEED_ options, they will be attempted
# autodetected; this should be sufficient for most people. Valid alternatives
# for the NEED_ options are "yes" and "no".

# Do you want to start the statd daemon? It is not needed for NFSv4.
NEED_STATD=

# Options for rpc.statd.
#   Should rpc.statd listen on a specific port? This is especially useful
#   when you have a port-based firewall. To use a fixed port, set this
#   this variable to a statd argument like: "--port 4000 --outgoing-port 4001".
#   For more information, see rpc.statd(8) or http://wiki.debian.org/SecuringNFS
STATDOPTS=

# Do you want to start the gssd daemon? It is required for Kerberos mounts.
NEED_GSSD=

NFS problems after upgrade to trunk

103

12.7k

17.6k

156.6k