Md raid1 ext3 and 4k sectors slow with directory operations

ext3mdmdadmraid1software-raid

I recently moved from a hardware RAID1 enclosure to using two eSATA drives with md. Everything seems to be working fine, except for the fact that directory traversals/listings sometimes crawl (on the order of 10s of seconds). I am using an ext3 filesystem, with the block size set to 4K.

Here is some relevant output from commands that should be important:

mdadm –detail:

/dev/md127:
        Version : 1.2
  Creation Time : Sat Nov 16 09:46:52 2013
     Raid Level : raid1
     Array Size : 976630336 (931.39 GiB 1000.07 GB)
  Used Dev Size : 976630336 (931.39 GiB 1000.07 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Tue Nov 19 01:07:59 2013
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Events : 19691

    Number   Major   Minor   RaidDevice State
       2       8       17        0      active sync   /dev/sdb1
       1       8        1        1      active sync   /dev/sda1

fdisk -l /dev/sd{a,b}:

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0xb410a639

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048  1953525167   976761560   83  Linux

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x261c8b44

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048  1953525167   976761560   83  Linux

time dumpe2fs /dev/md127 |grep size:

dumpe2fs 1.42.7 (21-Jan-2013)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Block size:               4096
Fragment size:            4096
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal size:             128M

real    2m14.242s
user    0m2.286s
sys     0m0.352s

The way I understand it, I've got 4K sectors on these drives (recent WD reds), but the partitions/filesystems appear to be properly aligned. Since it looks like I'm using md metadata version 1.2, I think I'm also good (based on mdadm raid1 and what chunksize (or blocksize) on 4k drives?). The one thing I haven't found an answer for online is whether or not having an inode size of 256 would cause problems. Not all operations are slow, it seems that the buffer cache does a great job of keeping things zippy (as it should).

My kernel version is 3.11.2

EDIT: new info, 2013-11-19

mdadm --examine /dev/sd{a,b}1 | grep -i offset
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
    Data Offset : 262144 sectors
   Super Offset : 8 sectors

Also, I am mounting the filesystem with noatime,nodiratime I'm not really willing to mess with journaling much since if I care enough to have RAID1, it might be self-defeating. I am tempted to turn on directory indexing

EDIT 2013-11-20

Yesterday I tried turning on directory indexing for ext3 and ran e2fsck -D -f to see if that would help. Unfortunately, it hasn't. I am starting to suspect it may be a hardware issue (or is md raid1 over eSATA just really dumb to do?). I'm thinking of taking each of the drives offline and seeing how they perform alone.

EDIT 2013-11-21

iostat -kx 10 |grep -P "(sda|sdb|Device)":

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.37     1.17    0.06    0.11     1.80     5.10    84.44     0.03  165.91   64.66  221.40 100.61   1.64
sdb              13.72     1.17    2.46    0.11   110.89     5.10    90.34     0.08   32.02    6.46  628.90   9.94   2.55
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

I truncated the output past this since it was all 0.00%

I really feel like it should be irrespective of ext4 vs. ext3 because this isn't just feeling a little slower, it can take on the order of tens of seconds to a minute and some change to tab auto-complete or run an ls

EDIT: Likely a hardware issue, will close question when confirmed

The more I think of it, the more I wonder if it's my eSATA card. I'm currently using this one: http://www.amazon.com/StarTech-PEXESAT32-Express-eSATA-Controller/dp/B003GSGMPU
However, I just checked dmesg and it's littered with these messages:

[363802.847117] ata1.00: status: { DRDY }
[363802.847121] ata1: hard resetting link
[363804.979044] ata2: softreset failed (SRST command error)
[363804.979047] ata2: reset failed (errno=-5), retrying in 8 secs
[363804.979059] ata1: softreset failed (SRST command error)
[363804.979064] ata1: reset failed (errno=-5), retrying in 8 secs
[363812.847047] ata1: hard resetting link
[363812.847061] ata2: hard resetting link
[363814.979063] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[363814.979106] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
....
[364598.751086] ata2.00: status: { DRDY }
[364598.751091] ata2: hard resetting link
[364600.883031] ata2: softreset failed (SRST command error)
[364600.883038] ata2: reset failed (errno=-5), retrying in 8 secs
[364608.751043] ata2: hard resetting link
[364610.883050] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[364610.884328] ata2.00: configured for UDMA/100
[364610.884336] ata2.00: device reported invalid CHS sector 0
[364610.884342] ata2: EH complete

I am also going to buy shorter shielded eSATA cables as I'm wondering if there is some interference going on.

Best Answer

THIS ENDED UP BEING A HARDWARE ISSUE

Switching to the new shielded cables did not help, but replacing the old card with this one: http://www.amazon.com/gp/product/B000NTM9SY did get rid of the error messages and the strange behavior. Will post something new if anything changes.

IMPORTANT NOTE FOR SATA ENCLOSURES:

Even after doing the above, any drive operation would be incredibly slow (just halt for 10-30 seconds) whenever the drive was idle for a while. The enclosure I'm using has an eSATA interface, but is powered by USB. I determined this was because it didn't have enough power to spin up, so I tried a a couple of things:

  • Using an external higher-current USB power source (in case the ports were only doing the 500mA minimum)
  • Disabling spin-down with hdparm -S 0 /dev/sdX (this alleviated the problem greatly, but did not resolve it completely)
  • Disabled advanced power management via hdparm -B 255 /dev/sdX (again, did not fully resolve)

Eventually, I discovered that Western Digital has a jumper setting for Reduced Power Spinup - designed especially for this use case!

The drives I am using are: WD Red WD10JFCX 1TB IntelliPower 2.5" http://support.wdc.com/images/kb/scrp_connect.jpg

Note that I am still operating without all the power management and spin down features (Still -B 255 and -S 0 on hdparm).

Final Verdict

Unfortunately, the RPS did not solve all of my problems, just reduced the magnitude and frequency. I believe the issues were ultimately due to the fact that the enclosure could not provide enough power (even when I use an AC-USB adapter). I eventually bought this enclosure:

http://www.amazon.com/MiniPro-eSATA-6Gbps-External-Enclosure/dp/B003XEZ33Y

and everything has been working flawlessly for the last three weeks.

Related Question