Preventing broken NFS connection from freezing the client system

nfs

We have an NFS 4 share, sharing a volume between a number of servers (NFS server, and clients all Debian 8). We have had some issues recently where network outages would freeze the client systems.

Our NFS options were minimal, just rw (and so the defaults hard, fg, etc).

I'm now experimenting with these options, but am not getting the behaviour I expect:
rw,soft,bg,retrans=6,timeo=150

(I've increased the retrans to offset some of the soft risk)

The procedure I'm following to test is :

Boot machine
cd to /mnt/mountpoint
Verify NFS connection ok
cd /
kill network ifdown eth0
cd to /mnt/mountpoint
ls

At this point the command line freezes, and I can't interupt it. After some time the message 'nfs: server [servername] not responding, timed out`, which seems to repeat once a minute (indefinately).

What I would like/expect to happen for the operation to fail, and return control.

Please could someone tell me where I'm going wrong with these settings?

(PS: I also tried mounting with autofs, but saw similar behaviour)

Thank you

Best Answer

intr should allow for you to get control again when you hit ^C, but usually not immediately.

   intr           If an NFS file operation has a major timeout and it is hard mounted, then allow signals to interupt the
                  file  operation  and cause it to return EINTR to the calling program.  The default is to not allow file
                  operations to be interrupted.

As you say, expectations are the problem here. Network problems can be temporary, but failing an operation is permanent. So most operations default to simply blocking until the operation completes.

This is the standard answer, but looking at a current man page I see this:

                  The  intr / nointr mount option is deprecated after ker-
                  nel 2.6.25.  Only SIGKILL can interrupt  a  pending  NFS
                  operation on these kernels, and if specified, this mount
                  option is ignored  to  provide  backwards  compatibility
                  with older kernels.

So it doesn't appear to me to be a NFS3/NFS4 issue, but a decision about how intr works. So you should be able to KILL the process, but that may not give you much utility.

I was unable to find the discussion about why the option was removed. Can you kill -KILL your process?

Related Solutions

Stop broken NFS mounts from locking a directory

Normally when mounting NFS it's a good idea to have flags set similar to this:

bg,intr,soft

   bg      If  the  first  NFS  mount  attempt times out, retry the mount in the 
           background.  After a mount operation is backgrounded, all subsequent mounts
           on the same NFS  server  will  be  backgrounded immediately, without first
           attempting the mount.  A missing mount point is treated as a timeout, to
           allow for nested NFS mounts.
   soft    If  an  NFS  file operation has a major timeout then report an I/O error
           to the calling program.  The default is to continue retrying NFS file
           operations indefinitely.
   intr    If  an  NFS  file  operation  has  a major timeout and it is hard mounted,
           then allow signals to interupt the file operation and cause it to return
           EINTR to the calling program.  The default is to not allow file operations
           to be interrupted.

You can in addition set:

timeo=5,retrans=5,actimeo=10,retry=5

which should allow the NFS mount to timeout and make the directory inaccessible if the NFS server drops the connection rather then waiting in retries.

Take a look at this link for more information about NFS mount options

NFS connection refused

Horribly. I read about 5 Tutorials, but none of them mentioned that the service rpcbind is needed.

For Debian

sudo service rpcbind start

does the trick.

Best Answer

Related Solutions

Stop broken NFS mounts from locking a directory

NFS connection refused

Related Question