NFS problems after upgrade to trunk
-
Ok, I’ve tested mounting the master node exports on the storage node in a temp folder and received a timeout. I repeated in reverse mounting the storage node on the master node in a temp folder without issue. Different mount location of course.
Tried mounting from masternode to masternode, still got the timeout, so it’s not a firewall issues, and it shouldn’t be because I disabled ufw as per the troubleshooting wiki.
showmount shows the exports, any suggestions? I don’t work with NFS much.
# showmount -e 10.2.yyy.xxx Export list for 10.2.yyy.xxx: /images/dev * /images * # mount 10.2.xxx.yyy:/images masterimages mount.nfs: Connection timed out
-
@John-Sartoris It does sound a lot like a firewall issue though. Re-disable UFW again just to see? It will only take moments.
Also, I suppose we should compare the master and the storage node now. What OS is the node using? What version of FOG? Are both updated to the same position (i.e. updates run at the same time) ?
Finally, just to back up a bit, try some simple network troubleshooting. from the storage node, try pinging the master. Try FTP-ing into the master, try ssh-ing into the master. And - check for IP conflicts for the master.
-
I’ve redisabled ufw, added allow rules for good measure. ping works, ssh works, ftp works. Both are running “Ubuntu 12.04.5 LTS”. Storage was installed with 12.04, 6+ months ago after a HD failure. Master was upgraded Friday from 10.04. Ran “apt-get update” and “apt-get upgrade” on both after the 12.04 upgrade. However just tried it again on the master and the 1 package it has an update for is “nfs-kernel-server”.
Master had “1.2.0-4ubuntu4.2” while storage had “1.2.5-3ubuntu3.2” , both now have “1.2.5-3ubuntu3.2”. Problem still existed, after 5 minutes and a manual “sudo service nfs-kernel-server restart” post upgrade , nfs seems to be better, I can now mount the master from storage.
Fog deploy still however still has the same issue when trying to use the local master node. If I point to the cross wan storage node deployment works.
-
@John-Sartoris Ok. I guess the next thing to check is probably the most simple, but I just didn’t think about it. Check permissions.
ls -lahRt /images
Do that on both the master and storage node. Compare results.
to set permissions on the images directory recursively, it’s just:
chmod -R 777 /images
We recommend 777 for troubleshooting.
-
Ok, I didn’t expect that considering Thursday both sites were working fine. I’ll try making the master match storage and see what happens.
Storage - working
/images/Win10BaseR1: total 18G -rwxr-xr-x 1 fog fog 18G Feb 4 15:06 sys.img.000 drwxr-xr-x 2 fog fog 4.0K Feb 4 15:03 . -rwxr-xr-x 1 fog fog 299M Feb 4 15:03 rec.img.000 -rwxr-xr-x 1 fog fog 0 Feb 4 14:33 d1.original.swapuuids -rwxr-xr-x 1 fog fog 259 Feb 4 14:33 d1.original.partitions -rwxr-xr-x 1 fog fog 15 Feb 4 14:33 d1.original.fstypes -rwxr-xr-x 1 fog fog 2 Feb 4 14:33 d1.fixed_size_partitions drwxrwxrwx 18 root root 4.0K Feb 4 12:22 ..
Master - not working
/images/Win10BaseR1: total 18G drwxrwxrwx 18 fog root 4.0K Feb 4 15:02 .. -rwxrwxrwx 1 root root 18G Feb 4 15:02 sys.img.000 drwxrwxrwx 2 root root 4.0K Feb 4 14:33 . -rwxrwxrwx 1 root root 299M Feb 4 14:33 rec.img.000 -rwxrwxrwx 1 root root 15 Feb 4 14:32 d1.original.fstypes -rwxrwxrwx 1 root root 0 Feb 4 14:32 d1.original.swapuuids -rwxrwxrwx 1 root root 259 Feb 4 14:32 d1.original.partitions -rwxrwxrwx 1 root root 2 Feb 4 14:32 d1.fixed_size_partitions
-
@John-Sartoris also,
chown -R fog:root /images
may help as well. But you still need 777 perms from the earlier command. -
I was already at 777 on the master but after resetting own:group and perms for good measure, and restarting nfs-kernel-server, still no luck on the deploy.
-
You mentioned creating .mntcheck files. Are you sure they are properly in place?
-
@John-Sartoris said:
I can now mount the master from storage.
Fog deploy still however still has the same issue when trying to use the local master node. If I point to the cross wan storage node deployment works.
make sure you have the right IP for the main server. Make sure the image path and FTP path are correct inside storage management. If all looks good, click save anyways just to push the settings. Sometimes the auto-fill feature in web browsers really screw with this area.
It’s just not typical for NFS to break in this manner, that’s why I ask you to check these things.
-
Same places as the working node.
/images: total 72 drwxrwxrwx 18 root root 4096 Feb 4 12:22 . drwxr-xr-x 26 root root 4096 Feb 8 09:01 .. -rwxrwxrwx 1 root root 0 Jul 29 2014 .mntcheck
/images/dev: total 8 drwxrwxrwx 2 root root 4096 Jul 29 2014 . drwxrwxrwx 18 root root 4096 Feb 4 12:22 .. -rwxrwxrwx 1 root root 0 Jul 29 2014 .mntcheck
-
@John-Sartoris Are you using the location plugin?
-
@Wayne-Workman said:
@John-Sartoris Are you using the location plugin?
Yes, I am using the location plugin. Is there a known issue?
I completely understand. When it doesn’t make sense double check things the wouldn’t make sense…
IP addresses and Paths are correct. Re-saved, I also double checked and re-saved the node choice in the location plugin.
Still no luck. Is there anyway I can get more detail on the client machine to see exactly what is erroring? It just says “An error has been detected!”, it doesn’t specify.
-
@John-Sartoris Do a debug download. on the task confirmation page, the debug option is a checkbox. This boots the target host into a shell. On the target host, there is a variable dump initially on the screen, it can be quite valuable to see what it says.
Additionally, I would recommend removing the location plugin entirely (via plugin management) and then reinstalling it and re-setting it up for your various locations.
-
From the client in debug…
mount: mounting 10.2.yyy.xxx:/images on images failed: Connection refused
sounds to me like it’s still a permissions or acl type issue.
-
@John-Sartoris refused is different than denied… Have you checked for IP conflicts?
-
I didn’t read all of the posts here, but could you do a
showmount -e 127.0.0.1
This will show us what you have NFS shared on your FOG server. You could then do the same command but use your FOG servers external IP address instead of the loopback interface. It could be a firewall issue. -
@george1421 said:
I didn’t read all of the posts here, but could you do a
showmount -e 127.0.0.1
This will show us what you have NFS shared on your FOG server.That’s shows that the exports are proper and they do actually work from the storage node cross site, but a client on the local site can’t connect.
-
@Wayne-Workman said:
@John-Sartoris refused is different than denied… Have you checked for IP conflicts?
I haven’t specifically checked, but this server is configured in the same way as the rest in our environment. DHCP with a reservation. This is the only server that should get this address. Access ports are not configured in this vlan. I have not had any issues connecting to the server expect via NFS. Ping and SSH work from debug client.
-
@John-Sartoris I’m thinking it is probably something network related.
-
I understand and agree with the assessment, but nothing has changed on the LAN in weeks other that the fog server updates. Firewalls are disabled and allowed for good measure. I even just tried added iptables rules without success.