Limit the size of the linux file cache

fileskernelmemory

I am running 64-bit Xubuntu 14.04, linux version 3.8.0-25 with 8GB of RAM.

I have a script (in MATLAB, for what it's worth) that loads a large number of data files (~23k) one at a time, for a total of around 45G of data. The trouble I'm having is that after each file is loaded, it remains in the file cache. Linux seems to prefer to keep these files cached in memory rather than any of the other memory contents, resulting in almost everything else being forced into swap, causing my system to slow to a crawl. I read several files a second, so this happens fairly quickly. I only read each file once, so I don't need the files to remain cached after I've finished with them.

I've tried turning the swap off, which works to an extent, but it seems like a poor solution (and it has already failed once when another program started using excessive memory). Is there a way that I can limit the amount of RAM linux uses for file caching?

Best Answer

Not really an answer, but too long for a comment.

Linux's memory management has been carefully tuned over its long lifetime by some very smart people and it normally does a pretty good job of making the right decision when choosing what to keep in memory and what to drop.

Unfortunately, it looks like your workload isn't very compatible with its decisions :-( Still, I am quite surprised about your report that it it is actually frequently preferring to force dirty memory out to swap rather than drop something from cache. A decision between dropping thing A from cache versus dropping thing B from cache might be a toss-up, but a decision between dropping thing C from cache versus writing out dirty memory D to swap should be heavily weighted toward dropping thing C because that's much less expensive!

There is a way to advise Linux that some piece of cached memory won't be needed in the future, and that is the madvise() system call with MADV_DONTNEED but I think it would be difficult for you to invoke that system call from a MATLAB script...

In any case, I don't think that reducing the size of the file cache is really what you want to do here. Remember that the executable files and libraries and run time files and plugins etc... for MATLAB itself, your script, your GUI environment, and other system software, also all live in the file cache, and you'd be forcing those things out of the file cache (in favour of either other contents of the file cache or in favour of heap memory and other non-file-backed mappings). That will cause your system to become slow just as surely as swapping will.

Related Question