“Stale file handle” on certain directories occurring immediately after NFS mount; no file handles open

nfs

For some time I've been experiencing a strange issue with NFS where a seemingly random subset of directories (always the same ones) under / consistently show up with stale file handles immediately after NFS mount.

I've been able to correct the problem by explicitly exporting the seemingly-random set of problem directories, but I'd like to see if I can fix things more completely so I don't have to occasionally add random dirs to the export table.

Below, I mount a filesystem, show that there are no open file handles, run ls, and rerun lsof. Empty lines added between commands for clarity:

# mount -t nfs -o vers=4,noac,hard,intr 192.168.0.2:/ /nfs -vvv
mount.nfs: trying text-based options 'vers=4,noac,hard,intr,addr=192.168.0.2,clientaddr=192.168.0.4'
192.168.0.2:/ on /nfs type nfs (rw,vers=4,noac,hard,intr)

# lsof | grep /nfs

# ls -lh /nfs
ls: cannot access /nfs/usr: Stale file handle
ls: cannot access /nfs/root: Stale file handle
ls: cannot access /nfs/etc: Stale file handle
ls: cannot access /nfs/home: Stale file handle
lrwxrwxrwx   1 root root       7 Mar 27  2017 bin -> usr/bin
drwxr-xr-x   6 root root     16K Jan  1  1970 boot
drwxr-xr-x 438 i336 users    36K Feb 28 12:12 data
drwxr-xr-x   2 root root    4.0K Mar 14  2016 dev
d?????????   ? ?    ?          ?            ? etc
d?????????   ? ?    ?          ?            ? home
lrwxrwxrwx   1 root root       7 Mar 27  2017 lib -> usr/lib
lrwxrwxrwx   1 root root       7 Mar 27  2017 lib64 -> usr/lib
drwxr-xr-x  15 root root    4.0K Oct 15 15:51 mnt
drwxr-xr-x   2 root root    4.0K Aug  9  2017 nfs
drwxr-xr-x  14 root root    4.0K Jan 28 17:00 opt
dr-xr-xr-x   2 root root    4.0K Mar 14  2016 proc
d?????????   ? ?    ?          ?            ? root
drwxr-xr-x   2 root root    4.0K Mar 14  2016 run
lrwxrwxrwx   1 root root       7 Mar 27  2017 sbin -> usr/bin
drwxr-xr-x   6 root root    4.0K Jun 22  2016 srv
dr-xr-xr-x   2 root root    4.0K Mar 14  2016 sys
drwxrwxrwt   2 root root    4.0K Dec 10  2016 tmp
d?????????   ? ?    ?          ?            ? usr
drwxr-xr-x  15 root root    4.0K May 24  2017 var

# lsof | grep /nfs

#

The subdirectories in question are not mountpoints; they seem completely normal:

$ ls -dlh /usr /root /etc /home
drwxr-xr-x 123 root root  12K Mar  3 13:34 /etc
drwxr-xr-x   7 root root 4.0K Jul 28  2017 /home
drwxrwxrwx  32 root root 4.0K Mar  3 13:55 /root
drwxr-xr-x  15 root root 4.0K Feb 24 17:48 /usr

There are no related errors in syslog about these directories. The only info that does show up mentions a different set of directories:

... rpc.mountd[10080]: Cannot export /proc, possibly unsupported filesystem or fsid= required
... rpc.mountd[10080]: Cannot export /dev, possibly unsupported filesystem or fsid= required
... rpc.mountd[10080]: Cannot export /sys, possibly unsupported filesystem or fsid= required
... rpc.mountd[10080]: Cannot export /tmp, possibly unsupported filesystem or fsid= required
... rpc.mountd[10080]: Cannot export /run, possibly unsupported filesystem or fsid= required

Here's what /etc/exports currently looks like:

/ *(rw,subtree_check,no_root_squash,nohide,crossmnt,fsid=0,sync)

The server side is running Arch Linux and currently on kernel 4.10.3.

The client side is Slackware 14.1 with kernel 4.1.6.

Best Answer

Your exports file looks abnormal for NFS 4:

/ *(rw,subtree_check,no_root_squash,nohide,crossmnt,fsid=0,sync)

Instead I think you need to follow the Arch Linux instructions for that fsid=0 line. It declares a special export, ‘the so-called NFS root.’

Then declare your own exports on subsequent lines as the instructions show. You can export the server’s root file system — not to be confused with the NFS root — as shown in this old Gentoo post.

Related Question