Linux – Optimizing read I/O with read ahead while avoiding storing data in page cache

diskiolinux

I need to be able to read data sequentially from a file while not storing the data that is being read in the page cache as the file contents are not expected to ever be read again and also because there is memory pressure on the box (want to use the precious memory for useful disk I/O caching).

The question I have is about how I can optimize these reads. Since I know that the data that is being read is sequentially placed on the disk (minus the fragmentation), I want to be able to read ahead (by increasing /sys/block/sda/queue/read_ahead_kb) but am not sure if this will lead to any benefit because I have to prevent the data that is being read from being stored in the page cache by using posix_fadvise (with the POSIX_FADV_DONTNEED flag).

Will the read ahead data be simply discarded because of the hint to drop the data from the page cache?

Best Answer

Use direct IO:

Direct I/O is a feature of the file system whereby file reads and writes go directly from the applications to the storage device, bypassing the operating system read and write caches. Direct I/O is used only by applications (such as databases) that manage their own caches.

An application invokes direct I/O by opening a file with the O_DIRECT flag.

For example:

int fd = open( filename, O_RDONLY | O_DIRECT );

Direct IO on Linux is quirky and has some restrictions. The application IO buffer must be page-aligned, and some file systems require that each IO request be an exact multiple of the page size. That last restriction can make reading/writing the last portion of a file difficult.

An easy-to-code way to handle readahead in your application can be done using fdopen and setting a large page-aligned buffer using posix_memalign and setvbuf:

// should really get page size using sysconf()
// but beware of systems with multiple page sizes
#define ALIGNMENT ( 4UL * 1024UL )
#define BUFSIZE ( 1024UL * 1024UL )
char *buffer;
...

int fd = open( filename, O_RDONLY | O_DIRECT );
FILE *file = fdopen( fd, "rb" );

int rc = posix_memalign( &buffer, ALIGNMENT, BUFSIZE );
rc = setvbuf( file, buffer, _IOFBF, BUFSIZE );

You can also use mmap() to get anonymous memory to use for the buffer. That has the advantage of being naturally page-aligned:

...
char *buffer = mmap( NULL, BUFSIZE, PROT_READ | PROT_WRITE,
    MAP_ANONYMOUS | MAP_PRIVATE, -1, 0 );
rc = setvbuf( file, buffer, _IOFBF, BUFSIZE );

Then just use fread()/fgets() or any FILE *-type read function you want to read from the file stream.

You do need to check using a tool such as strace that the actual read system calls are done with a page-aligned and page-sized buffer - some C library implementations of FILE *-based stream processing don't use the buffer specified by setvbuf for just IO buffering, so the alignment and size can be off. I don't think Linux/glibc does that, but if you don't check and the size and/or alignment is off, your IO calls will fail.

And again - Linux direct IO can be quirky. Only some file systems support direct IO, and some of them are more particular than others. TEST this thoroughly if you decide to use it.

The posted code will do a 1 MB read-ahead whenever the stream's buffer needs to be filled. You can also implement more sophisticated read-ahead using threads - one thread fills one buffer, other thread(s) read from a full buffer. That would avoid processing "stutters" as the read-ahead is done, but at the cost of a good amount of relatively complex multi-threaded code.

Related Question