Hang during ‘touch’ operation on NFS mount

filesfilesystemsnfs

I have two NFS clients mounted to an openfiler 2.99 NFS share at 192.0.2.3:

  • 192.0.2.1 mounts 192.0.2.3:/mnt/nfs01/volnfs01/share01 with rw,noatime,nodiratime,hard,rsize=32768,wsize=32768,noacl,nocto,tcp,nfsvers=3
  • 192.0.2.1 mounts 192.0.2.3:/mnt/nfs01/volnfs01/share02 with rw,noatime,nodiratime,hard,rsize=32768,wsize=32768,nfsvers=3,tcp,noacl,nocto
  • 192.0.2.2 mounts 192.0.2.3:/mnt/nfs01/volnfs01/share02 with rw,noatime,nodiratime,hard,rsize=32768,wsize=32768,nfsvers=3,tcp,noacl,nocto

touch broken

My problem is with 192.0.2.2's NFS mount. When I touch a file in on that mount, the process hangs indefinitely… I used strace touch /mnt/share02/this and got this far…

rt_sigaction(SIGRTMIN, {0x3b71c05ae0, [], SA_RESTORER|SA_SIGINFO, 0x3b71c0f500}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x3b71c05b70, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x3b71c0f500}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
brk(0)                                  = 0xafb000
brk(0xb1c000)                           = 0xb1c000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=99158576, ...}) = 0
mmap(NULL, 99158576, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fce244c0000
close(3)                                = 0
open("/mnt/share02/this", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666
                                                                    ^^^ stops touching
                                                                     |
                                                                     |

When I check ps -elf from another terminal, I see the process in "D" state…

[mpenning@host192_0_2_2 ~]$ ps -elf | awk '$2=="D"'
0 D mpenning  8157  8032  0  80   0 - 26293 rpc_wa 09:59 pts/2    00:00:00 touch /mnt/share02/this
[mpenning@host192_0_2_2 ~]$

showmount isn't finding a problem though….

[mpenning@host192_0_2_2 ~]$ showmount -e 192.0.2.3
Export list for 192.0.2.3:
/mnt/nfs01/volnfs01/share01 192.0.2.2/255.255.255.255,192.0.2.1/255.255.255.255
/mnt/nfs01/volnfs01/share02 192.0.2.2/255.255.255.255,192.0.2.1/255.255.255.255
[mpenning@host192_0_2_2 ~]$

Various status of NFS services…

[mpenning@host192_0_2_2 ~]$ service nfs status
rpc.svcgssd is stopped
rpc.mountd (pid 9168) is running...
nfsd (pid 9232 9231 9230 9229 9228 9227 9226 9225) is running...
rpc.rquotad (pid 9164) is running...
[mpenning@host192_0_2_2 ~]$ service rpcbind status
rpcbind (pid  9088) is running...
[mpenning@host192_0_2_2 ~]$ service nfslock status
rpc.statd (pid  9256) is running...
[mpenning@host192_0_2_2 ~]$

Network configuration (default gw isn't required since this is a dedicated layer2 NFS vlan):

[mpenning@host192_0_2_2 ~]$ sudo cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
NM_CONTROLLED=no
ONBOOT=yes
BOOTPROTO=none
IPADDR=192.0.2.2
NETMASK=255.255.255.0
DNS2=none
TYPE=Ethernet
GATEWAY=
DNS1=none
IPV6INIT=no
USERCTL=no
MTU=9000
[mpenning@host192_0_2_2 ~]$

This looks pretty nasty. I have done the following on 192.0.2.2:

  • Restarted all NFS
  • init 6 the machine
  • ping 192.0.2.3 to make sure it still has connectivity to the server
  • Checked dmesg
  • Checked showmount -e 192.0.2.3

This feels like a permissions problem, but I don't know where to go from here…

How can I fix this problem so I can read / write to any file on 192.0.2.2's mount of 192.0.2.3:/mnt/nfs01/volnfs01/share02?


touch works

If I execute the same touch command from 192.0.2.1, everything is fine…

rt_sigaction(SIGRTMIN, {0xb096e0, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0xb09b80, [], SA_RESTART|SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
uname({sys="Linux", node="host192_0_2_1.localdomain.local", ...}) = 0
brk(0)                                  = 0x8d4d000
brk(0x8d6e000)                          = 0x8d6e000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=99158544, ...}) = 0
mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7574000
close(3)                                = 0
open("/mnt/share02/this", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK|O_LARGEFILE, 0666) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
utimensat(0, NULL, NULL, 0)             = 0
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?

/etc/exports from 192.0.2.3

[root@T1-Netfile01 backups]# head /etc/exports

# PLEASE DO NOT MODIFY THIS CONFIGURATION FILE!
#       This configuration file was autogenerated
#       by Openfiler. Any manual changes will be overwritten
#       Generated at: Fri Nov 8 9:35:39 CST 2013

/mnt/nfs01/volnfs01/share02 192.0.2.1/255.255.255.255(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)  192.0.2.2/255.255.255.255(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)

/mnt/nfs01/volnfs01/share01 192.0.2.1/255.255.255.255(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)  192.0.2.2/255.255.255.255(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)

[root@T1-Netfile01 backups]#

Best Answer

And if you change the order of the IPs in the /etc/exports file what happens then? Put the .2.2 IP 1st and the .2.1 2nd.

Also I would confirm what the exports are presenting as using the command:

$ showmount -e 192.0.2.3

/etc/exports can be very particular about the formatting!

Other things to try

  1. I typically specify my hosts in the /etc/exports like this:

    /cobbler/isos   192.168.1.0/24(rw,no_root_squash)
    

    So for you with a single host IP:

    /mnt/nfs01/volnfs01/share02 192.0.2.1/32(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)  192.0.2.2/32(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)
    /mnt/nfs01/volnfs01/share01 192.0.2.1/32(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)  192.0.2.2/32(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)
    
  2. nfs related servcies

    Make sure that nfslock and other related services are both running on 192.0.2.2.

  3. If you're using jumbo frames, be sure that ping -s <jumbo_mtu> 192.0.2.3 works from 192.0.2.2

Related Question