NFS versus SunRPC
-
We have just upgraded our Fog server to a Ubuntu 18.04 VM running the latest version of Fog but are having problems mounting the images from the Fog server. Error message is always a connection timeout and thus a failure to mount.
Testing using a “Debug” job instead of a upload/download and issuing a mount command from the debug prompt shows that the Fog Debug mont command uses port 111 (sunrpc) and fails. BUT using the same mount command in a MacOS Terminal shows it using port 2019 (nfs) and correctly mounting the Fog server /images directory.
So why does Fog use sunrpc and fail but other computers use NFS and work?
I’ve gone through the “Troubleshoot NFS” many times but still can’t work this out. All help appreciated. -
@Sebastian-Roth Many thanks Sebastian for your great assistance. This problem is now SOLVED!!
For some reason D-Link consider blocking low source port numbers in their switches is a good way to stop DOS attacks. The switch [DGS-3620] can block tcp syn src port <1024 in the DOS Attack settings.
Network based “security” is becoming more prevalent, so using low src ports may be forced to be changed in FOG? -
I would ensure that you have the ubuntu firewall disabled, and selinux set to permissive.
Beyond that do you have any firewalls/routers between the target computer and the FOG server.
-
selinux is not part of a standard Ubuntu install and not involved here. There is no firewall running either. All computers are on the same network too, so there “should be” no difference networkwise between the Mac and the Linux computers. Not sure why the standard Fog debug, and I assume the upload and download commands, use sunrpc and not NFS.
Thanks -
Some more information:
Doing some tcpdumps and watching traffic flow, this is what APPEARS to be happening.
The Client puts out call on port 111 to the Server (rpcbind) which responds. There are 7 packets exchanged between the Client and Server which appears to be a ‘conversation’ with the portmapper (rpcbind) telling the client where to locate the NFS server (port 2049). There are then some ARP packets between the Client and Server establishing their mac addresses. Then nothing. No traffic.
It is as if the Server has told the client (running the FOG debug) to call back using NFS (port 2049) to complete the NFS mount, but this never occurs and thus the Client “times out” the mount command. -
@DJ said in NFS versus SunRPC:
There are then some ARP packets between the Client and Server establishing their mac addresses. Then nothing. No traffic.
Mind uploading that packet capture somewhere and send me a PM where I can download it to see? I’ve dug through many dumps and found a lot of things just by looking at the packets.
-
@Sebastian-Roth Thank you for the offer. The problem is at a school where I volunteer on a Friday, so will be there tomorrow to have a look. Not sure to where I can upload a tcpdump file at this moment but will let you know. Thanks again.
-
@DJ Sent you a PM, see the speech bubble in the upper right corner.
-
We exchanged a bit via email but there is still no solution to that. Looking at the PCAPs again I still can’t make any sense of this.
One thing is sure, the Mac is doing a lot more shouting which our FOS client does not. Comparing my nfs1.pcap and your test2.pcap I see that both don’t do the NFS NULL Call and neither do they use the NFS GETPORT Call … NFS. I don’t know NFS well enough to say why but it is working nicely on my side!
See this NFS mount stuff is working for many thousand installations and I would be very amazed if it turns out to be a FOS client problem. But we will see.
Please send the full iptables output as requested just to make sure!
As well please run command
rpcinfo -p localhost
on your FOG server and post output. -
Thanks Sebastian,
rcpinfo -p localhost program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100005 1 udp 50037 mountd 100005 1 tcp 42203 mountd 100005 2 udp 52143 mountd 100005 2 tcp 55967 mountd 100005 3 udp 56692 mountd 100005 3 tcp 51853 mountd 100003 3 tcp 2049 nfs 100003 4 tcp 2049 nfs 100227 3 tcp 2049 100003 3 udp 2049 nfs 100227 3 udp 2049 100021 1 udp 42980 nlockmgr 100021 3 udp 42980 nlockmgr 100021 4 udp 42980 nlockmgr 100021 1 tcp 40509 nlockmgr 100021 3 tcp 40509 nlockmgr 100021 4 tcp 40509 nlockmgr
iptables -L -n -v Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
What I have discovered in the dump from our client, there is a GETPORT CALL to program MOUNT (100005) which is responded to by the server with GETPORT REPLY Port:51853 which is a port monitored by mountd, and not NFS. This explains why our client just stops, but not sure what mountd and the client are then meant to do. Obviously nothing is happening at the moment but is this because mountd doesn’t pass back to portmap to get the NFS port or the client should actually ask for a NFS port and not a mountd port?
Given that there are thousands of clients out there without this problem, the answer resides within our FOG server but where? Is there a config file in the OS (Ubuntu18.04) to setup portmapper? Or doies the problem lie with mountd?
Thanks
David… -
This is not dead. We have been working on this in short sessions but couldn’t figure out the root cause of the issue yet. To me it still seems like something is blocking/dropping packages at some point. Though we see some of the NFS traffic packages on the FOG server, some of it is missing.
-
@Sebastian-Roth Thanks Sebastian, I’ll try checking source ports and see if there is any consistency. I find it very strange that accessing the FogServer through the mount command fails to make a connection on port 51853 but telnetting from the same Client on that port works. The difference is the source port though. There is no record though of the firewall (Shorewall) on the Host to the FogVM blocking any incoming packets on port 51853. All very strange. Something else I will try is taking the FogVM off the VLAN to make testing easier.
-
@Sebastian-Roth After taking the FogServer and the client off a VLAN, to enable testing via a hub to monitor packet activity, we have discovered that the problem is within our switch. The switch is ‘swallowing’ packets that originate on a low source port. The FogServer tells the FogClient to call the mountd on a small range of high ports (e.g.:51853) and the FogClient uses a low port in the 800/900 range. This call from the FogClient to the FogServer is blocked by the switch.
Why does the FogClient use a low source port? Looking at MacOS and Ubuntu18.04, this appears to be standard practice but both MacOS and Ubuntu also put out calls to NFS (2049) which is successful and enables mounting of the server drives on the client. The FogClient doesn’t do this, relying upon the call that is blocked by the switch. -
Why does a switch block such packets?? It’s not a switch’s business to do so!!!
As we use Buildroot which might be a bit more minimalistic in some places. On the other hand it could be Ubuntu using a patched version of NFS mount that does the special call on port 2049. Not sure though.
-
@Sebastian-Roth Many thanks Sebastian for your great assistance. This problem is now SOLVED!!
For some reason D-Link consider blocking low source port numbers in their switches is a good way to stop DOS attacks. The switch [DGS-3620] can block tcp syn src port <1024 in the DOS Attack settings.
Network based “security” is becoming more prevalent, so using low src ports may be forced to be changed in FOG? -
@DJ Amazing we found this after such a long time of intense digging in packet dumps and testing. Thanks for posting the information on the switch so people can find this here in the forums.
I have though about this but don’t really see why we should “fix” NFS mount to use high source ports. I might be wrong here but from what I have seen in the packet dumps it seems like this is general behavior and changing this would need a fixup deep down in the NFS client libs code - something that we would need to maintain over a long time since I don’t see this change making it into the official code line. What do you think?