NFSv4 Errors, But Not in NFSv3

mountnfsonc-rpcrhel

I'm working on an NFS solution for RHEL6.5 clients (all VMs) with RHEL6.5 and RHEL7 hosts. Currently, the RHEL7 host with RHEL6.5 clients works fine. The trouble is with the RHEL6.5 host.

These problems might be down to aspects of the server I can't control, as the server has been having issues lately that it didn't last year. If you think that's the issue, please suggest ways I can prove this to my superiors, and begin the process of getting a new machine.

The solution was initially being crafted to use NFSv4, which was going swell. The RHEL6.5 host, however, is not as keen as the RHEL7 host. Mounts succeed, but file access does not work, e.g. cp, less. In terminal, they hang. tail-ing the client's /var/log/messages shows state manager: lease expired failed on NFSv4 server nfs_master with error 10018. Per the standard, that error code is for NFS4ERR_RESOURCE, documented here. My attempt to resolve the resource issue was by increasing the number of nfsd processes via the command-line, and by setting the appropriate config in /etc/sysconfig/nfs. It didn't help. This issue also occurs if the exported directory is mounted on the NFS server itself.

What is not shown in the logs for the host nor client is another error 10022, or at least I assume this is an NFSv4 error code. This is only viewable when tcpdump-ing the interface that the NFS communication is going over: IP test-host.nfs > test_client-1.3297002672: reply ok 52 getattr ERROR: unk 10022 If this error code is indeed an NFSv4 one, then it is for NFS4ERR_STALE_CLIENTID documented here.

When the mount command is changed to set nfsvers=3, actions like cp are successful and generate no errors on the client nor the host. The first attempt will take a little long, 5 seconds maybe, then futures actions are much faster.

At a time there will be at most four clients mounting the export and reading from it, and potentially the same file.

So, my questions are:

  1. What are the server-side resources being referred to by the NFS4ERR_RESOURCE description?
  2. How do I resolve NFS4ERR_RESOURCE and NFS4ERR_STALE_CLIENTID errors?
  3. Why is NFSv3 functioning as expected, but not NFSv4?

nfs-utils version and release (for both clients and RHEL6.5 host): 1.2.3.39.el6

mount commands:

  • mount -n -t nfs -o ro,noexec,timeo=10,retrans=3,retry=0,soft,rsize=32768,intr,noatime
  • mount -n -t nfs -o nfsvers=3,ro,noexec,timeo=10,retrans=3,retry=0,soft,rsize=32768,intr,noatime

EDIT:
Our resolution for this issue was to fall back to NFSv3 protocol. Everything works just fine. I won't answer this question with a "just fall back to NFSv3", but this issue is probably too niche to ever see an answer.

Best Answer

Try -fstype=nfs4,rw,intr,hard,proto=tcp,port=2049,acl as a test and make sure 2049/tcp is open to the client on the server. If there's a firewall in the way it needs to pass 2049/tcp as well.

Related Question