Linux – How to prevent writing to CIFS from stalling for minutes on end

cifslinux

When mounting a CIFS filesystem from a NetApp filer and copying files of several gigabytes to it, the copying process will frequently hang for minutes on end. The kernel writes messages to the syslog such as these:

Nov 15 14:03:15 myclient kernel: [173570.048387] CIFS VFS: sends on sock ffff88003a2d4000 stuck for 15 seconds
Nov 15 14:03:15 myclient kernel: [173570.049115] CIFS VFS: Error -11 sending data on socket to server
Nov 15 19:01:22 myclient kernel: [191466.594088] CIFS VFS: Server myfileserver has not responded in 120 seconds. Reconnecting...

The last message may in fact repeat before writing resumes.
While the process is hanging, it cannot be killed; even attempts to reboot the machine will hang.

The server is a NetApp, I don't know its specifications yet.
The client is two Ubuntu 14.04 LTS machines, one of them virtual (it happens on both). Their kernels are version 3.5.0-54-generic and 3.13.0-68-generic, respectively.

I have three questions.

  1. If you have ever seen this problem, on which version of Linux?
  2. How can this problem occur in the first place? Shouldn't the CIFS filesystem support be smarter than to hang up uninterruptably?
  3. Which mount options are guaranteed to eliminate this problem?

My fstab entry looks like this (anonymized):

//myfileserver/path/to/mydirectory /mnt/mydirectory cifs credentials=mycredentialsfile,rw,sec=ntlmv2,forceuid,forcegid,file_mode=0644,dir_mode=0755,noserverino,nounix,user,noauto 0 0

Adding cache=none does not fix the problem. Adding directio doesn't either: man mount.cifs claims it is a supported option, but it isn't. What does appear to fix the problem is adding wsize=4096 or wsize=8192: thus far, my tests have shown no stalling with those options. (With wsize=16384, the stalling still occurs.)

Rather than just going by trial and error, I'd like to understand what is going on and eliminate the problem with 100% certainty. Can you tell me why this is happening or what to do?

(Several questions on Ask Ubuntu, Unix & Linux, and ServerFault have been posted that look like this problem, but most of them aren't: they complain about stalling on reading files or on the filesystem being idle, while in my case, this never occurs, the stalling only occurs when writing files)

Best Answer

By default cifs mounts use protocol 1.0, which besides obsolete, is largely inefficient and does not recover well from sleep for several reasons.

Depending on what is your server technology, you can go from using vers=2.1 at least, or vers=3.0.

I would advise checking with documentation or vendor which version of the SMB protocol it supports, or at least using 3.0 and consulting the output of the mount command to see the negotiated version.

Changing for a more recent CIFS version protocol should solve some or all of your stalling issues and give you more efficient transfer speeds.

Please see the related question CIFS randomly losing connection to Windows share for more details.

Please do note that the stalling will improve, but won't go away when copying large files. That behaviour is a feature e.g. the files go to buffers, and the filesystem waits for the server notification the copy has been finished with success.

Related Question