Why is filesystem intensive script not faster on ram disk


I have a script which creates a lot of files and directories. The script does black box tests for a program which works with a lot of files and directories. The test count grew and the tests were taking too long (over 2 seconds). I thought I run the tests in a ram disk.

I ran the test in /dev/shm. Strangely it did not run any faster. Average run time was about the same as on normal harddisk. I also tried in a fuse based ram disk written in perl. The website is gone but I found it in the internet archive. Average run time on the fuse ram disk is even slower. Perhaps because of the suboptimal implementation of the perl code.

Here is a simplified version of my script:

#! /bin/sh

preparedir() {
  mkdir foo
  mkdir bar
  touch bar/file
  mkdir bar/baz
  echo qux > bar/baz/file

systemundertest() {
  # here is the black box program that i am testing
  # i do not know what it does exactly
  # but it must be reading the files
  # since it behaves differently based on them
  find $1 -type f -execdir cat '{}' \; > /dev/null

singletest() {
  mkdir actual
  (cd actual; preparedir)
  systemundertest actual
  mkdir expected
  (cd expected; preparedir)
  diff -qr actual expected

manytests() {
  while read dirname; do
    rm -rf $dirname
    mkdir $dirname
    (cd $dirname; singletest)

seq 100 | manytests

The real script does a bit more error checking and result collecting and a summary. The find is a dummy for the actual program I am testing.

I wonder why my filesystem intensive script does not run faster on a memory backed filesystem. Is it because the linux kernel handles the filesystem cache so efficiently that it practically is a memory backed filesystem?

Best Answer

Quite generally speaking, all operations happen in RAM first - file systems are cached. There are exceptions to this rule, but these rather special cases usually arise from quite specific requirements. Hence until you start hitting the cache flushing, you won't be able to tell the difference.

Another thing is, that the performance depends a lot on the exact file system - some are targeting easier access to huge amounts of small files, some are efficient on real-time data transfers to and from big files (multimedia capturing/streaming), some emphasise data coherency and others can be designed to have small memory/code footprint.

Back to your use case: in just one loop pass you spawn about 20 new processes, most of which just create one directory/file (note that () creates a sub-shell and find spawns cat for every single match) - the bottleneck indeed isn't the file system (and if your system uses ASLR and you don't have a good fast source of entropy your system's randomness pool gets depleted quite fast too). The same goes for FUSE written in Perl - it's not the right tool for the job.

Related Question