I have an Ubuntu (12.04) NFS server that has a high load (larger than 10) even when nothing is running.

In detail, the storage is provided by an iSCSI device, on which I have 5 logical volumes (LVM) and some ext4 partitions. Even with all services stopped, and no exports (that is no client traffic), the load is at 10. Running iostat shows that one particular mapped device (/dev/dm-1) is always being written to (if I interpret the output correctly):

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               0.00         0.00         0.00          0          0
sdb             342.00         0.00         9.21          0          9
dm-0              0.00         0.00         0.00          0          0
dm-1            615.00         0.00        11.71          0         11

(sdb is where the iSCSI device appears, dm-n the various logical volumes). I have literally stopped (almost) all other services running, and I can with good confidence say that the moment I start the NFS server the load goes up, and when I stop it the load goes down. What is going on? How can I see what is writing to the disk? (I tried lsof but that shows no process).

Additions Adding information as asked.

uptime says:

18:27:15 up 1 day,  9:59,  2 users,  load average: 14.22, 12.42, 11.55

vmstat says:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  1    960 865924  51604 23204424    0    0    43     4    7   20  0  2 86 12

mpstat says:

Linux 3.2.0-26-generic-pae (leitrim)    12/25/2012  _i686_  (8 CPU)

06:33:53 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:33:53 PM  all    0.08    0.04    0.96   11.62    0.00    0.88    0.00    0.00   86.43

dstat -cdD sdb -ng 60 3

----total-cpu-usage---- --dsk/sdb-- -net/total- ---paging--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out 
  0   1  86  12   0   1| 324k 8729k|   0     0 |   1B    8B
  0   1  87  12   0   1|  17k 8953k|8064k 9652k|   0     0 
  0   1  86  12   0   1|1229B 9081k|8010k 9796k|   0    68B
  0   1  89  10   0   1|3209B 8364k|7703k 9014k|   0     0 


  program version netid     address                service    owner
    100000    4    tcp6      ::.0.111               portmapper superuser
    100000    3    tcp6      ::.0.111               portmapper superuser
    100000    4    udp6      ::.0.111               portmapper superuser
    100000    3    udp6      ::.0.111               portmapper superuser
    100000    4    tcp          portmapper superuser
    100000    3    tcp          portmapper superuser
    100000    2    tcp          portmapper superuser
    100000    4    udp          portmapper superuser
    100000    3    udp          portmapper superuser
    100000    2    udp          portmapper superuser
    100000    4    local     /run/rpcbind.sock      portmapper superuser
    100000    3    local     /run/rpcbind.sock      portmapper superuser
    100024    1    udp         status     116
    100024    1    tcp        status     116
    100024    1    udp6      ::.137.98              status     116
    100024    1    tcp6      ::.175.197             status     116
    100021    1    udp         nlockmgr   superuser
    100021    3    udp         nlockmgr   superuser
    100021    4    udp         nlockmgr   superuser
    100021    1    tcp         nlockmgr   superuser
    100021    3    tcp         nlockmgr   superuser
    100021    4    tcp         nlockmgr   superuser
    100021    1    udp6      ::.206.206             nlockmgr   superuser
    100021    3    udp6      ::.206.206             nlockmgr   superuser
    100021    4    udp6      ::.206.206             nlockmgr   superuser
    100021    1    tcp6      ::.132.23              nlockmgr   superuser
    100021    3    tcp6      ::.132.23              nlockmgr   superuser
    100021    4    tcp6      ::.132.23              nlockmgr   superuser
    100003    2    tcp            nfs        superuser
    100003    3    tcp            nfs        superuser
    100227    2    tcp            -          superuser
    100227    3    tcp            -          superuser
    100003    2    udp            nfs        superuser
    100003    3    udp            nfs        superuser
    100227    2    udp            -          superuser
    100227    3    udp            -          superuser
    100003    2    tcp6      ::.8.1                 nfs        superuser
    100003    3    tcp6      ::.8.1                 nfs        superuser
    100227    2    tcp6      ::.8.1                 -          superuser
    100227    3    tcp6      ::.8.1                 -          superuser
    100003    2    udp6      ::.8.1                 nfs        superuser
    100003    3    udp6      ::.8.1                 nfs        superuser
    100227    2    udp6      ::.8.1                 -          superuser
    100227    3    udp6      ::.8.1                 -          superuser
    100005    1    udp        mountd     superuser
    100005    1    tcp        mountd     superuser
    100005    1    udp6      ::.165.76              mountd     superuser
    100005    1    tcp6      ::.141.19              mountd     superuser
    100005    2    udp         mountd     superuser
    100005    2    tcp         mountd     superuser
    100005    2    udp6      ::.233.222             mountd     superuser
    100005    2    tcp6      ::.211.16              mountd     superuser
    100005    3    udp         mountd     superuser
    100005    3    tcp         mountd     superuser
    100005    3    udp6      ::.152.158             mountd     superuser
    100005    3    tcp6      ::.201.200             mountd     superuser

Best Answer

After some more investigation I can answer my own questions, at least partially:

  1. Writing to one of the devices was being caused by a process running wild on one of the NFS clients. I got there by shutting down one client at a time. It would be nice to have a command to see which client is writing to the NFS server, but if there's one I could not find it.
