Ubuntu – Very high cache usage causing slowdown

cachememory leakmemory usageram

I'm trying to identify the culprit of what's causing my personal computer to be extremely sluggish. The biggest suspect is memory. When the computer is running fast my cache memory looks normal. However when it's running slow it looks like this:

luke@Luke-XPS-13:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7830        1111        1090         277        5628        1257
Swap:         16077         665       15412

and this:

luke@Luke-XPS-13:~$ vmstat -S M
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0    665   1065     67   5562    0    0    34    88   43   23 13  4 82  0  0

Caches are taking up 5.5GB of my 8GB memory, when all programs are closed, and after running

echo "echo 3 > /proc/sys/vm/drop_caches"

which should be force clearing them. As soon as the computer starts dipping into the swap its game over for usable speed. Shutdown temporarily fixes the problem but it eventually comes back and I can't figure out what's causing it. Slabtop reveals slightly more about the culprit, but I'm not sure what it implies. Why kmalloc-4096?

 Active / Total Objects (% used)    : 1554043 / 1607539 (96.7%)
 Active / Total Slabs (% used)      : 167569 / 167569 (100.0%)
 Active / Total Caches (% used)     : 76 / 109 (69.7%)
 Active / Total Size (% used)       : 5091450.96K / 5105920.97K (99.7%)
 Minimum / Average / Maximum Object : 0.01K / 3.18K / 18.50K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
1254755 1254755 100%  4.00K 156847        8   5019104K kmalloc-4096
  5430   5430 100%    2.05K    362       15     11584K idr_layer_cache
 20216   9010  44%    0.57K    722       28     11552K radix_tree_node
  8820   7358  83%    1.05K    294       30      9408K ext4_inode_cache
 38577  25253  65%    0.19K   1837       21      7348K dentry
 12404  11432  92%    0.55K    443       28      7088K inode_cache
 30120  29283  97%    0.20K   1506       20      6024K vm_area_struct
 31722  31722 100%    0.12K    933       34      3732K kernfs_node_cache
 13696  12514  91%    0.25K    856       16      3424K kmalloc-256
 27144  27134  99%    0.10K    696       39      2784K buffer_head
 41088  29789  72%    0.06K    642       64      2568K kmalloc-64
   632    567  89%    3.75K     79        8      2528K task_struct
  2432   2274  93%    1.00K    152       16      2432K kmalloc-1024
  3048   2677  87%    0.64K    127       24      2032K shmem_inode_cache
   912    845  92%    2.00K     57       16      1824K kmalloc-2048
   172    162  94%    8.00K     43        4      1376K kmalloc-8192
  1736   1561  89%    0.56K     62       28       992K ecryptfs_key_record_cache
  5103   4073  79%    0.19K    243       21       972K kmalloc-192
  1792   1626  90%    0.50K    112       16       896K kmalloc-512
  1456   1456 100%    0.61K     56       26       896K proc_inode_cache
 10149   8879  87%    0.08K    199       51       796K anon_vma
 24960  19410  77%    0.03K    195      128       780K kmalloc-32
   360    352  97%    2.06K     24       15       768K sighand_cache

Best Answer

Based on your comments, you say cache usage doesn't noticeably drop when you try to echo 3 > /proc/sys/vm/drop_caches

This can only happen if this is a cache for writing. If you write 5 GB to some files, the data immediately lands in cache and your program continues. The cache is actually written to storage in the background as fast as possible. In your case the storage seems dramatically slow and you accumulate the unwritten cache until it drains all of your RAM and starts pushing everything out to swap.

Kernel will never write cache to swap partition. It keeps it in RAM until it is safely written to destination.

Kernel will never drop unwritten cache, because it would be a data loss (you've saved a file, so you expect the data to land on the permanent storage).

You can only solve it by speeding up the storage. This issue is often seen on storage mounted via network (check your mount for types cifs, nfs, sshfs, etc.) or slow USB1 devices.

You could also make issue much less dramatic to the system by capping the dirty cache with sysctl vm.dirty_ratio=10 before it grows too much.

dirty_ratio

Contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.

The total available memory is not equal to total system memory.

If that's a correct diagnosis, you will see that cache can be easily dropped (at least 90% of it) and that the process that writes these gigabytes becomes very slow. The rest of system will become more responsive.