SOLVED NFS problems after upgrade to trunk

  • Friday I spent some time upgrading our old but well working fog server prepping for our Windows 10 roll out. I upgraded Ubuntu from 10.04 to 12.04, recovered from the apache 2.2 to 2.4 upgrade, installed the new Fog Package, finally got “everything” online, logged in to find another update has been release. I know it’s trunk so it’s going to happen. That’s what I want. Installed 6237 and started testing.

    Problem is I can’t deploy an image right now. I’m getting an error “Could not mount images folder (/bin/” . As far as I can see the nfs exports are still correct. I found something about creating .mntcheck files in /images and /images/dev, I’ve done that.

    Please help, I’m not seeing anything in logs to point me in a direction.

  • @george1421 @Wayne-Workman

    Both nodes are back online now. I had a left over “-T” in /etc/default/nfs-kernel-server on the storage node from my attempts to disable NFSv4. What I was reading said NFSv3 doesn’t use TCP and is only UDP. Well, however fog connects it needs TCP with v3.

    Here is where I ended, I don’t know if the RPCNFSCOUNT of 48 is needed to be increased from the default 8, but I read that this increases the threads for nfs workers.


    # Number of servers to start up
    # To disable nfsv4 on the server, specify '--no-nfs-version 4' here
    RPCNFSDCOUNT='48 --no-nfs-version 4'
    # Runtime priority of server (see nice(1))
    # Options for rpc.mountd.
    # If you have a port-based firewall, you might want to set up
    # a fixed port here using the --port option. For more information,
    # see rpc.mountd(8) or
    # To disable NFSv4 on the server, specify '--no-nfs-version 4' here
    RPCMOUNTDOPTS='--manage-gids --no-nfs-version 4'
    # Do you want to start the svcgssd daemon? It is only required for Kerberos
    # exports. Valid alternatives are "yes" and "no"; the default is "no".
    # Options for rpc.svcgssd.
    # Options for rpc.nfsd.


    Verbosity = 0
    Pipefs-Directory = /run/rpc_pipefs
    # set your own domain here, if id differs from FQDN minus hostname
     Domain = localdomain
    Nobody-User = nobody
    Nobody-Group = nogroup
    Method = nsswitch


    # If you do not set values for the NEED_ options, they will be attempted
    # autodetected; this should be sufficient for most people. Valid alternatives
    # for the NEED_ options are "yes" and "no".
    # Do you want to start the statd daemon? It is not needed for NFSv4.
    # Options for rpc.statd.
    #   Should rpc.statd listen on a specific port? This is especially useful
    #   when you have a port-based firewall. To use a fixed port, set this
    #   this variable to a statd argument like: "--port 4000 --outgoing-port 4001".
    #   For more information, see rpc.statd(8) or
    # Do you want to start the gssd daemon? It is required for Kerberos mounts.

  • @Wayne-Workman said:

    @John-Sartoris said:

    Edit - Correction, the master node is working, but the storage node still doesn’t mount…

    Am I mistaken, or was the problem exactly the opposite before?

    You are half right :), the nodes have swapped, but before was a “connection refused”, not it’s “operation not supported”

  • Moderator

    I did have a problem where when I upgrade from 1.2.0 to one of the trunk releases a while back I had an error so the installer aborted. I fixed the issue and reran the installer, this is how I got the fog user (in the OS) out of sync with the database. On the second run of the installer, the installer didn’t touch the password saved in the database, but it did update the the fog (linux) user password to something new.

    I ended up looking in the database at the storage node to get the fog password then set the linux fog user to the same password. That got things back in sync for me. But I first discovered that when I uploaded an image, the ftp process moves it from /image/dev to the right location. I saw ftp login errors that told me where the issue was with my case.

    For your fog storage node, you might want to resync what the database says the password should be vs what the OS thinks the fog password should be.

    Just saying, this random its working / no its not is very difficult to track down. You are never sure what actually fixed the problem

  • @John-Sartoris said:

    Edit - Correction, the master node is working, but the storage node still doesn’t mount…

    Am I mistaken, or was the problem exactly the opposite before?

  • @george1421 @Wayne-Workman

    Well, it’s working. I’m not entirely sure what happened, but I can tell you what I noticed that made me try the full deploy again was the “fog.mount” command in debug mode. I was looking at all the commands available looking for ideas to try and I saw that. I tried it and it completed back to a prompt, then “mount” listed /images as connected. I still was unable to mount manually but…

    One thing I did do was reset the fog user password on the master server. I tried to download a kernel ( i’m not even using it) and received a complaint that the ftp password was wrong and it should be a long crypted string. Then when trying to create a deploy task the ftp password was again wrong, and it should be the normal short password I started with. I haven’t tried to download another kernel to test for the error.

    Edit - Correction, the master node is working, but the storage node still doesn’t mount…

    From debug mode mount is returning “operation not supported” to storage node:/Images

  • Moderator

    @John-Sartoris There is nothing wrong with ubuntu. Working with it is sometimes harder and then sometimes easier then RHEL. Its all perspective. I would not change unless you have to choice.

    Have you confirmed that the rpc.idmapd process is running on your FOG master server?

    Thinking a bit more, root is not able to nfs mount between the servers (I would assume this is the same if you tried to nfs mount from the fog master server to the images folder on the storage node (i.e. just remove the FOS client out of the picture).

  • @george1421 @Wayne-Workman

    I understand what you are telling me about rebuilding. Any reason I should switch from Ubuntu to CentOS like you suggest? Ever since I switched from Gentoo, probably 10 years ago, I’ve been using Ubuntu. It’s what I know, I could probably pick up the the intricacies of CentOS vs Ubuntu easily enough, but if there is no reason then why?

    As for my current state, the storage node stopped working, and I haven’t been able to get it back. I’ve upgraded both to Ubuntu 14.04, and after a Grub related issues on the master node both are back to the same state for NFS. I have however verified that I have NFSv4 disable and am working only in v3. I found the log file showing connections to rpc.mountd.

    /var/log/syslog shows a successful connection being made.

    Feb  9 13:30:42 lk-fog-01 rpc.mountd[1360]: authenticated mount request from 10.2.ccc.bbb:911 for /images (/images)

    reports just the same as the storage node which completes and can browse without issue.

    Feb  9 13:35:27 lk-fog-01 rpc.mountd[1360]: authenticated unmount request from for /images (/images)
  • Moderator

    While I’m almost where Wayne is, I think you should compare these files /etc/default/nfs-common and /etc/default/nfs-kernel-server between your working storage node and your troublesome master server. Since you think the storage node was already running nfsv4 someone may have already adjusted this server properly. You are so close now I’d hate for you to give up.

    Right now you have the luxury of having one system that does work and one that doesn’t. You just need to find the differences in the setup.

  • @John-Sartoris I exhausted what I know about it.

    If I were in your shoes, I’d rebuild the server. This thread has been open for 8+ hours when a server rebuild would fix it in under 4. It’s just my opinion, but again if it were me, I’d cut my losses and rebuild.

  • @george1421

    I’ve tried to disable NFSv4 as per and the comments in “/etc/default/nfs-kernel-server” however the problem still exists.


    Just wanted to say I really appreciate all the help you have both been.

    I’m out for the day. I’ll pick this up again in the morning.

  • Moderator

    Thank you for posting the export file. Understand from my perspective this is a system that was setup by hand (no offense intended). So we must look in every corner.

    OK so this is a NFSv4 setup. I know the access controls are greater for NFSv4 over v3. Let me do a little google–fu and see what I can find.

    [edit] Just for clarity I’m from the rhel world, so ubuntu is just enough different to be maddening at times. There appears to be two files that should be compared between the working server and the master fog server. /etc/default/nfs-common and /etc/default/nfs-kernel-server these hold the settings for the nfs server. Also ensure that the rpc.idmapd process is running [/edit]

  • @george1421

    As posted earlier, config matches on both servers.


    /images *(ro,sync,no_wdelay,no_subtree_check,insecure_locks,no_root_squash,insecure,fsid=0)
    /images/dev *(rw,async,no_wdelay,no_subtree_check,no_root_squash,insecure,fsid=1)

    nfs-kernel-server 1.2.5-3ubuntu3.2 does appear to be NFSv4, however this is the version running on both servers. It may have been original with the working storage node, while the master may have been upgraded to 3 to 4.

    also NFS does appear to be reaching the server from the client.

    # netstat -t -u -c | grep 10.2.ccc.bbb
    tcp        0      0 fog-01:ssh  10.2.ccc.bbb:41650       ESTABLISHED
    tcp        0      0 fog-01:nfs  10.2.ccc.bbb:692         ESTABLISHED
  • Moderator

    Just out of curiosity, do the /etc/export files match between the master fog server and the storage node? Can you post the export files here?

    Other random thoughts:

    Are you running NFSv4 on either of your server? NFSv4 has additional security requirements.

    Does the FOG user ID have the same group and user IDs on all fog servers? (this may not be mandatory)

  • @Wayne-Workman

    Do NFS connections get logged on the server? I’ve tried several suggestions I’ve found but I don’t see any messages for successful or failed connections. I’m sure I’ve seen connection attempts in a log before when testing VMWare ESXi at home, just can’t recall where.

    Looks like the export option “no_acl” deals with the file system ACLs not anything network related. And the export is to * so in theory anyone should be able to connect.

  • @John-Sartoris said:

    Configuring another VM would be possible, but quite a heavy bit of work for what sure seems to be a firewall config or NFS ACL issue that happen during OS or Fog upgrade.

    That’s just the thing though. you’ve disabled UFW, and NFS has no protections. Ubuntu does not come pre-loaded with Security Enhanced Linux, either, like other distributions do

  • @george1421

    I’m trying to manually mount from the FOG Client now, and I’m receiving a connection refused.

    Configuring another VM would be possible, but quite a heavy bit of work for what sure seems to be a firewall config or NFS ACL issue that happen during OS or Fog upgrade.

  • Moderator

    @Wayne-Workman I’m in a meeting right now (yeah its a bit boring) so I can’t test. But from a debug (boot) session, is the showmount command installed in FOS? It would be interesting to know from the client perspective if the FOG server is showing its mount information.

    [edit] The other thing would be to try to do a manual nfs mount from the FOS client to the FOG server. If it maps then there is a parameter setup incorrectly (somewhere) in the FOG GUI. The mount command would be something like mount <fog_server_ip>:/images /img (or what ever the local directory is called on the FOG client) [/edit]

  • @John-Sartoris can you quickly throw together a CentOS 7 VM and install fog trunk to test?

  • @Wayne-Workman

    I understand and agree with the assessment, but nothing has changed on the LAN in weeks other that the fog server updates. Firewalls are disabled and allowed for good measure. I even just tried added iptables rules without success.