Slow USB 3 write speed

ioperformanceusb-drive

Writing to my USB 3 thumb drive (SanDisk Extreme SDCZ80-064G-FFP) is very slow on Linux: 1 GB takes longer than 200s. Using Windows (dual-boot on the same computer), the same 1 GB file can be copied in about 8s. The stick is formatted in FAT (it came pre-preformatted and I didn't change it) and I would like to keep it this way since I am using it on Windows as well,

How can I fix it? What steps can I perform to diagnose what is causing this?

I am running Manjaro/Arch with kernel version 4.5.4-1.

Edit:
First of all: I realized that the drive is formatted in FAT (not in NTFS, as I originally stated in the question) when I tried to mount it with -o big_writes. Sorry for the mistake!

I am adding the outputs of the commands mentioned in the comments. I don't see a problem with any of these.

Output of journalctl -f when I connect the drive, mount it and write some data:

Mai 23 20:32:37 manjaro kernel: usb 2-6: USB disconnect, device number 7
Mai 23 20:32:39 manjaro dbus[608]: [system] Activating via systemd: service name='org.freedesktop.Avahi' unit='dbus-org.freedesktop.Avahi.service'
Mai 23 20:32:39 manjaro dbus[608]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.Avahi.service': Unit dbus-org.freedesktop.Avahi.service not found.
Mai 23 20:32:41 manjaro kernel: usb 2-6: new SuperSpeed USB device number 8 using xhci_hcd
Mai 23 20:32:41 manjaro kernel: usb-storage 2-6:1.0: USB Mass Storage device detected
Mai 23 20:32:41 manjaro kernel: scsi host12: usb-storage 2-6:1.0
Mai 23 20:32:41 manjaro mtp-probe[3627]: checking bus 2, device 8: "/sys/devices/pci0000:00/0000:00:14.0/usb2/2-6"
Mai 23 20:32:41 manjaro mtp-probe[3627]: bus: 2, device: 8 was not an MTP device
Mai 23 20:32:42 manjaro kernel: scsi 12:0:0:0: Direct-Access     SanDisk  Extreme          0001 PQ: 0 ANSI: 6
Mai 23 20:32:42 manjaro kernel: sd 12:0:0:0: [sdc] 122544516 512-byte logical blocks: (62.7 GB/58.4 GiB)
Mai 23 20:32:42 manjaro kernel: sd 12:0:0:0: [sdc] Write Protect is off
Mai 23 20:32:42 manjaro kernel: sd 12:0:0:0: [sdc] Mode Sense: 53 00 00 08
Mai 23 20:32:42 manjaro kernel: sd 12:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Mai 23 20:32:42 manjaro kernel:  sdc: sdc1
Mai 23 20:32:42 manjaro kernel: sd 12:0:0:0: [sdc] Attached SCSI removable disk
Mai 23 20:32:43 manjaro dbus[608]: [system] Activating via systemd: service name='org.freedesktop.Avahi' unit='dbus-org.freedesktop.Avahi.service'
Mai 23 20:32:43 manjaro dbus[608]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.Avahi.service': Unit dbus-org.freedesktop.Avahi.service not found.
Mai 23 20:32:52 manjaro sudo[3667]: user : TTY=pts/1 ; PWD=/home/user ; USER=root ; COMMAND=/usr/bin/mount /dev/sdc1 /mnt/
Mai 23 20:32:52 manjaro sudo[3667]: pam_unix(sudo:session): session opened for user root by (uid=0)
Mai 23 20:32:52 manjaro sudo[3667]: pam_unix(sudo:session): session closed for user root
Mai 23 20:33:11 manjaro sudo[3676]: user : TTY=pts/1 ; PWD=/home/user ; USER=root ; COMMAND=/usr/bin/dd bs=1M count=1024 if=/dev/zero of=/mnt/test conv=fdatasync status=progress
Mai 23 20:33:11 manjaro sudo[3676]: pam_unix(sudo:session): session opened for user root by (uid=0)
Mai 23 20:35:01 manjaro anacron[2235]: Job `cron.daily' started
Mai 23 20:35:03 manjaro anacron[2235]: Job `cron.daily' terminated
Mai 23 20:35:45 manjaro sudo[3676]: pam_unix(sudo:session): session closed for user root

Ouput of dmesg:

[ 2507.302345] usb 2-6: new SuperSpeed USB device number 8 using xhci_hcd
[ 2507.317395] usb-storage 2-6:1.0: USB Mass Storage device detected
[ 2507.317758] scsi host12: usb-storage 2-6:1.0
[ 2508.319922] scsi 12:0:0:0: Direct-Access     SanDisk  Extreme          0001 PQ: 0 ANSI: 6
[ 2508.333123] sd 12:0:0:0: [sdc] 122544516 512-byte logical blocks: (62.7 GB/58.4 GiB)
[ 2508.333353] sd 12:0:0:0: [sdc] Write Protect is off
[ 2508.333362] sd 12:0:0:0: [sdc] Mode Sense: 53 00 00 08
[ 2508.333634] sd 12:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 2508.346488]  sdc: sdc1
[ 2508.347918] sd 12:0:0:0: [sdc] Attached SCSI removable disk

Commands used for mounting and writing:

$ sudo mount /dev/sdc1 /mnt/
$ sudo dd bs=1M count=1024 if=/dev/zero of=/mnt/test conv=fdatasync status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1,1 GB, 1,0 GiB) copied, 154,158 s, 7,0 MB/s

Output of lsusb -t:

/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 5000M
    |__ Port 6: Dev 8, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
    |__ Port 9: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
    |__ Port 10: Dev 3, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
    |__ Port 10: Dev 3, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M

Edit 2: I tried two other kernels: 4.4.11 and 4.6.0. Writing is still slow with both of them. Additionally, the problem seems to be related to the drive since I get higher transfer speeds (90 MB/s) for an external USB 3 hard disk.

Edit 3: I did some benchmarking on an Ubuntu 16.04 Live System. The results are way better (though still not very good):

ubuntu@ubuntu:/media/ubuntu/274D-D62C$ dd bs=1M count=1024 if=/dev/zero of=./test conv=fdatasync status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.2623 s, 87.6 MB/s
ubuntu@ubuntu:/media/ubuntu/274D-D62C$ dd bs=1M count=1024 if=/dev/zero of=./test conv=fdatasync status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 64.5742 s, 16.6 MB/s
ubuntu@ubuntu:/media/ubuntu/274D-D62C$ dd bs=1M count=1024 if=/dev/zero of=./test conv=fdatasync status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 83.6521 s, 12.8 MB/s
ubuntu@ubuntu:/media/ubuntu/274D-D62C$ dd bs=1M count=1024 if=/dev/zero of=./test conv=fdatasync status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 21.842 s, 49.2 MB/s
ubuntu@ubuntu:/media/ubuntu/274D-D62C$ dd bs=1M count=1024 if=/dev/zero of=./test conv=fdatasync status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 16.3149 s, 65.8 MB/s

Edit 4: I just tried it once more using a recent Arch/Manjaro kernel (4.11.1) and the result is a bit better: I get transfer speeds of about 10 MB/s (so ~100s for 1GB). This is, however, still much slower than Windows.

Edit 5: I finally found the time to come back to this issue. Running the current Manjaro kernel Linux janmanjaro 5.4.74-1-MANJARO #1 SMP PREEMPT Sun Nov 1 13:43:13 UTC 2020 x86_64 GNU/Linux, I get 15 – 30 MB/s (usually closer to 15 MB/s). Mounting with sudo mount -o async,flush /dev/sdc1 /mnt/ (as suggested in Extremely slow speed when writing to USB FAT32 drive in Linux) improves the speed by roughly 5 MB/s. This is till ridiculously slower than Windows and also much slower than Ubuntu.

I also ran sudo fio --name TEST --eta-newline=5s --filename=test.img --rw=randrw --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reportin as suggested by @Mikko Rantalainen. The results are:

Run status group 0 (all jobs):
   READ: bw=3444KiB/s (3527kB/s), 3444KiB/s-3444KiB/s (3527kB/s-3527kB/s), io=202MiB (212MB), run=60001-60001msec
   WRITE: bw=3432KiB/s (3514kB/s), 3432KiB/s-3432KiB/s (3514kB/s-3514kB/s), io=201MiB (211MB), run=60001-60001msec

I have not run this on Ubuntu yet.

Best Answer

Do you really need FAT? In my experience, Linux implementation of FAT driver seems to access the device with a sequence that causes some devices to have really poor performance. I have seen 10x performance improvement from some flash devices when formatted as ext4 instead.

Considering that the link you provided in comments (http://www.beginninglinux.com/home/machine-related/sandisk-extreme-64-usb-3-speed-test-benchmark-review) was testing read and write speeds without partitions, it might be that your flash device has poor performance when accessed by linux FAT driver. If that's true, it should also have bad random IO performance in general.


You can test the device performance with fio. Change to some directory inside the device you want to test and run following command to test performance with mixed random 4k read/write performance:

fio --name TEST --eta-newline=5s --filename=test.img --rw=randrw --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting

That creates 500 MB test file called test.img and then starts to read and write in random 4 KB blocks with queue depth of 1. Increase the size in case your device has a huge internal cache. This test has maximum runtime of 60 seconds. Consider this the worst case scenario with your device. A pretty fast flash device (Intel SSD 910) gets following results here:

...
read : io=1041.4MB, bw=17773KB/s, iops=4443, runt= 60001msec
...
write: io=1038.5MB, bw=17723KB/s, iops=4430, runt= 60001msec
...

To test the best case you can increase the blocksize and do 2 paraller processes with iodepth=32:

fio --name TEST --eta-newline=5s --filename=test.img --rw=randrw --size=500m --io_size=10g --blocksize=512k --ioengine=libaio --iodepth=32 --direct=1 --numjobs=2 --runtime=60 --group_reporting

A pretty fast flash device (Intel SSD 910) gets following results here:

...
read : io=10892MB, bw=457088KB/s, iops=892, runt= 24401msec
...
write: io=9588.0MB, bw=402365KB/s, iops=785, runt= 24401msec
...

Of course, if you really want high scores, you should do sequential reading with high iodepth. This should be near the numbers you see in the device spec sheets (that is, never in real life):

fio --name TEST --eta-newline=5s --filename=test.img --rw=read --size=500m --io_size=10g --blocksize=512k --ioengine=libaio --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting

A pretty fast flash device (Intel SSD 910) gets following results here:

...
read : io=10240MB, bw=1468.8MB/s, iops=2937, runt=  6972msec
...

Note that this falls short from spec sheet (around 2 GB/s) because I have configured I/O scheduler lowest possible latency (deadline I/O scheduler with 1 > ../queue/iosched/fifo_batch and 50 > ../queue/iosched/read_expire).

Note the huge difference between worst case scenario (around 17 MB/s) and best case scenario (around 1470 MB/s).

Related Question