Stale NFS File Handle why does fsid resolve it

filesystemsmountnfs

Problem statement (note that this problem has been solved, but there is a question about why the solution works)

The NFS server is Ubuntu 16.04.4 LTS. The clients are a mix of Ubuntu 16.04.4 LTS and CentOS 6.10 and 7.

NFS server has been working fine for months, and one particular export was serving several clients for their backups. The NFS server directory looked like this:

/mnt/backups/client1
/mnt/backups/client2
/mnt/backups/client3
/mnt/backups/client4

The /etc/exports contained:

/mnt/backups 1.2.3.0/24(rw,sync,no_subtree_check)

The client only mounts the nfs server during a backup, and then unmounts the backup when it is done.

This was working fine, however, it was determined that the clients should not be able to see each other in the /mnt/backups dir. Each client is using the same backup uid/gid. Therefore, a decision was made to separate out the directories through the use of the /etc/exports file.

To that end, the NFS server was stopped, and the /etc/exports was modified so it contains:

/mnt/backups/client1 1.2.3.21(rw,sync,no_subtree_check)
/mnt/backups/client2 1.2.3.22(rw,sync,no_subtree_check)
/mnt/backups/client3 1.2.3.23(rw,sync,no_subtree_check)
/mnt/backups/client4 1.2.3.24(rw,sync,no_subtree_check)

Recall, the client only mounts the NFS server when it is doing a backup (at 4am). On the server the NFS service was restarted, and the exports are checked with exportfs, it looks good.

OK, testing client1:

mount nfserver:/mnt/backups/client1 /mnt/client1

works fine, however, any action on /mnt/client1 results in:

cannot open directory /mnt/client1/: Stale file handle

Actions taken to resolve (which did not work): Restarting NFS on server. Restarting client. lsof |grep /mnt on client and server to see if any programs were holding the files open. Permissions checks on server/client. Again, switching the NFS /etc/exports back to the old file and mounting the nfs server from the client works. Switching back to the "new" method does not work.

After much gnashing of teeth, man pages and STFW only to find answers like "restart NFS", I recalled I had this problem years ago, and for some reason fsid had something to do with the solution. After reading the man pages, the following was added to the NFS server /etc/exports file:

/mnt/backups/client1 1.2.3.21(fsid=101,rw,sync,no_subtree_check)
/mnt/backups/client2 1.2.3.22(fsid=102,rw,sync,no_subtree_check)
/mnt/backups/client3 1.2.3.23(fsid=103,rw,sync,no_subtree_check)
/mnt/backups/client4 1.2.3.24(fsid=104,rw,sync,no_subtree_check)

Again, after this action the only thing performed was an exportfs -ra on the server.

Now all clients can mount the nfs server exports and they all work.

Why is that a solution?

Should we use fsid on every export?

Reading a man page like this one does not seem to clearly explain why the fsid is a solution. I had an idea that it could be that the stale mount was some kind of NFS file handler on the client end (or perhaps the server side), but for that to persist after a reboot would seem strange.

Best Answer

In short, the fsid is the way client and server identify an export after it's mounted.

As the man page states, the fsid will be derived from the underlying filesystem, if not specified.

The four exports have the same fsid, so it's possible that when client1 is asking about the files from its mount, the server thinks it's trying to access client4's export (assuming it's keeping the latest occurrence of the same fsid only.)

I guess there are a few ways to validate this hypothesis, for instance by checking that one (and only one) of the 4 clients will work. Also by keeping only the client1 export, without the other 3, and confirming client1 will work then.

See also this answer for a way to query the fsid from a client, using the mountpoint -d command, which you could use from the 4 clients to confirm the 4 mounts have the same fsid.

Why is that a solution?

Because with distinct fsid's, the exports will look distinct to the NFS server, so it will properly match client accesses to their corresponding mounts.

Should we use fsid on every export?

Yes, I think that's a good practice, it ensures you'll keep the control and changes in the underlying storage devices and exports will not affect your clients.

(In my case, I recall adopting it since some of my NFS servers with disks on a SAN would sometimes scan disks in a different order, so after a reboot /dev/sdh would suddenly become /dev/sdj. Mounting using labels ensured it would be mounted at the correct location, but the fsid would change and clients would get lost. This is before the ubiquity of UUIDs, which apparently are now supported and are of course a much better solution for this, that wouldn't break when disks are scanned in a different order. But, still, specifying the fsid explicitly is not a bad idea, lets you keep full control.)

Related Question