Preventing broken NFS connection from freezing the client system

nfs

We have an NFS 4 share, sharing a volume between a number of servers (NFS server, and clients all Debian 8). We have had some issues recently where network outages would freeze the client systems.

Our NFS options were minimal, just rw (and so the defaults hard, fg, etc).

I'm now experimenting with these options, but am not getting the behaviour I expect:
rw,soft,bg,retrans=6,timeo=150

(I've increased the retrans to offset some of the soft risk)

The procedure I'm following to test is :

  • Boot machine
  • cd to /mnt/mountpoint
  • Verify NFS connection ok
  • cd /
  • kill network ifdown eth0
  • cd to /mnt/mountpoint
  • ls

At this point the command line freezes, and I can't interupt it. After some time the message 'nfs: server [servername] not responding, timed out`, which seems to repeat once a minute (indefinately).

What I would like/expect to happen for the operation to fail, and return control.

Please could someone tell me where I'm going wrong with these settings?

(PS: I also tried mounting with autofs, but saw similar behaviour)

Thank you

Best Answer

intr should allow for you to get control again when you hit ^C, but usually not immediately.

   intr           If an NFS file operation has a major timeout and it is hard mounted, then allow signals to interupt the
                  file  operation  and cause it to return EINTR to the calling program.  The default is to not allow file
                  operations to be interrupted.

As you say, expectations are the problem here. Network problems can be temporary, but failing an operation is permanent. So most operations default to simply blocking until the operation completes.

This is the standard answer, but looking at a current man page I see this:

                  The  intr / nointr mount option is deprecated after ker-
                  nel 2.6.25.  Only SIGKILL can interrupt  a  pending  NFS
                  operation on these kernels, and if specified, this mount
                  option is ignored  to  provide  backwards  compatibility
                  with older kernels.

So it doesn't appear to me to be a NFS3/NFS4 issue, but a decision about how intr works. So you should be able to KILL the process, but that may not give you much utility.

I was unable to find the discussion about why the option was removed. Can you kill -KILL your process?

Related Question