Linux – How to Prevent Hard Disks from Spinning Up by Processes Listing Disks

debianlinux

I have a 24/7 always-on Debian Jessie based headless home server that has a large 1TB SSD for the OS and all of my frequently accessed files. This same system has 4 larger hard disk drives in a SnapRAID array. These are mainly for archiving infrequently accessed Blu-rays and want those drives to remain spun down in standby unless I actually read or write to them. They are all formatted as ext4 and mounted with noatime and nodiratime enabled.

So even though no process or program should be regularly accessing those drives in any direct way, the hard drives constantly get spun up from standby. It seems to be related to graphical programs that provide a gui file browser, even something like Chromium. If I don't even browse into those drives, I'm thinking that these processes by simply getting a list of available drives spins up the hard disks. Much like blkid does. The problem is, it's hard to determine the root cause of this since none of these processes are actually reading or writing the filesystem on those drives, so no files are actually changing or being touched. Is there some sort of cache that I can populate or a buffer to prevent these programs from spinning up the hard drive simply by getting a list of available disks? This is honestly driving me insane, since I can't find a reliably way to keep these disks spun-down even though there is no direct access of the filesystem.

UPDATE: Thanks to Stephen's answer, I was able to trace the disk activity to gvfs and udisks. It's a real shame that these processes insist on waking up disks in standby when they aren't actually being accessed to do any real I/O with the filesystem. So far I just uninstalled them, knowing that it will remove some functionality from PCManFM and the like.

Best Answer

You can use blktrace (available in Debian) to trace all the activity to a given device; for example

sudo blktrace -d /dev/sda -o - | blkparse -i -

or just

sudo btrace /dev/sda

will show all the activity on /dev/sda. The output looks like

  8,0    3       51   135.424002054 16857  D  WM 167775248 + 8 [kworker/u16:0]
  8,0    3       52   135.424011323 16857  I  WM 209718336 + 8 [kworker/u16:0]
  8,0    3        0   135.424011659     0  m   N cfq496A  / insert_request

The fifth column is the process identifier, and the last one gives the process name when there is one.

You can also store traces for later analysis; blktrace includes a number of analysis tools such as the afore-mentioned blkparse and btt. blktrace is a very low-level tool so it may not be all that easy to figure out what caused activity in the first place, but with the help of the included documentation (see /usr/share/doc/blktrace if you installed the Debian package) and the blktrace paper it should be possible to figure out what's causing the spin-ups.

Related Solutions

Linux – How to configure Linux to cache file metadata in preference to contents

To control how Linux caches things refer to this https://www.kernel.org/doc/Documentation/sysctl/vm.txt

In particular look at vfs_cache_pressure, you probably want a really low value or maybe even zero (1 sounds a bit safer to me though):

vfs_cache_pressure
------------------

Controls the tendency of the kernel to reclaim the memory which is used for
caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
never reclaim dentries and inodes due to memory pressure and this can easily
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

Also you may want to modify swappiness so that you never swap data or make it so that it only happens in extreme cases.

The drop_caches option might be handy for explicitly dropping the data you don't want cached anymore.

I'm sure there are probably other options that may help, so review the kernel documentation.

To apply them I'd put the settings you want to change in /etc/sysctl.conf or whatever your OS has to restore them at boot.

Linux Memory Management – Will Linux Start Killing Processes If Memory Gets Short?

It can.

There are two different out of memory conditions you can encounter in Linux. Which you encounter depends on the value of sysctl vm.overcommit_memory (/proc/sys/vm/overcommit_memory)

Introduction:
The kernel can perform what is called 'memory overcommit'. This is when the kernel allocates programs more memory than is really present in the system. This is done in the hopes that the programs won't actually use all the memory they allocated, as this is a quite common occurrence.

overcommit_memory = 2

When overcommit_memory is set to 2, the kernel does not perform any overcommit at all. Instead when a program is allocated memory, it is guaranteed access to have that memory. If the system does not have enough free memory to satisfy an allocation request, the kernel will just return a failure for the request. It is up to the program to gracefully handle the situation. If it does not check that the allocation succeeded when it really failed, the application will often encounter a segfault.

In the case of the segfault, you should find a line such as this in the output of dmesg:

[1962.987529] myapp[3303]: segfault at 0 ip 00400559 sp 5bc7b1b0 error 6 in myapp[400000+1000]

The at 0 means that the application tried to access an uninitialized pointer, which can be the result of a failed memory allocation call (but it is not the only way).

overcommit_memory = 0 and 1

When overcommit_memory is set to 0 or 1, overcommit is enabled, and programs are allowed to allocate more memory than is really available.

However, when a program wants to use the memory it was allocated, but the kernel finds that it doesn't actually have enough memory to satisfy it, it needs to get some memory back. It first tries to perform various memory cleanup tasks, such as flushing caches, but if this is not enough it will then terminate a process. This termination is performed by the OOM-Killer. The OOM-Killer looks at the system to see what programs are using what memory, how long they've been running, who's running them, and a number of other factors to determine which one gets killed.

After the process has been killed, the memory it was using is freed up, and the program which just caused the out-of-memory condition now has the memory it needs.

However, even in this mode, programs can still be denied allocation requests. When overcommit_memory is 0, the kernel tries to take a best guess at when it should start denying allocation requests. When it is set to 1, I'm not sure what determination it uses to determine when it should deny a request but it can deny very large requests.

You can see if the OOM-Killer is involved by looking at the output of dmesg, and finding a messages such as:

[11686.043641] Out of memory: Kill process 2603 (flasherav) score 761 or sacrifice child
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB