Ubuntu – How to troubleshoot a disk IO performance issue possibly related to dm-crypt/LUKS

encryptionluks

Issue

I recently installed Ubuntu 16.04 LTS (kernel 4.8.0-52) on a Lenovo T460p with an i7-6820HQ, 32GB of RAM, and a 512GB Micron 1100 SSD. I checked the full disk encryption box during the installation and used the default partitioning layout. In general, performance is great.

However, over time my builds started running taking about twice as long. Further, during parts of the build that write large files any (non-build) task that requires disk I/O ends up waiting a lot. This includes launching new programs, loading pages in Firefox, etc. In Firefox, for example, I can navigate the UI, switch tabs and everything is fine. But if I follow a link the whole UI locks up until things quiet down.

So in summary, after some period of time, builds suddenly take longer and at certain points during the build the computer is basically unusable.

What can I do to try and diagnose or resolve this issue?

Troubleshooting Info

  • Don't reboot often so the system is often up for several days before I run into this issue. Once I hit it, I flail for a bit trying to figure out the issue, then reboot so I can keep working.

  • The only thing that resolves the issue is rebooting the machine. I've tried exiting all applications, logging out and back in, and dropping the buffer cache (flail theory that as it used memory space disk syncs were happening more frequently) but only rebooting works.

  • As a long shot, I tried the solution to this answer but there was no change in behavior.

  • Running iotop shows the dmcrypt_write thread using 99% I/O whenever I'm experiencing the issues. When I'm not experiencing the issue, I also see dmcrypt_write pop to the top with a relatively high IO % but it doesn't stay there very long.

  • If I run dd if=/dev/urandom of=$HOME/bigfile bs=10k count=200k; sync when things are working normally, dmcrypt_write will jump to the top for a second or two but it's no where near the same duration as during one of my builds.

  • A full build generates about 1.4 GB of data. It's a Java project with several modules. So, lots of little files are created plus some larger JAR files that aggregate all those little files.

  • There is always plenty of memory available and the swap partition is not being used.

  • I have coworkers with similar computers (T460p) also running Ubuntu that are not experiencing this issue. They they all seem to have different SSD brand/models, though.

Update

The issue just surfaced again so I did some more testing based on the reply to this question.

  • The file system is still not mounted with the discard option so I instead ran fstrim assuming that would be somewhat similar to having had the discard option enabled
  • I didn't do enough timing when the issue first happened, but after running fstrim, build speeds seemed to be back to normal… but after the build completes, the dmcrypt_write thread kicks in and makes the system unusable for a period of time. All and all the total time to build and for the system to become usable seems to be about the same as before.
  • I changed /proc/sys/vm/dirty_ratio to 2 and /proc/sys/vm/dirty_background_ratio to 1 and ran some builds. The builds took longer than normal—about the same as the last time I hit this issue, but the system didn't seem to lock up as much. Changing it back to 20 and 10 reverted to the behavior mentioned above.
  • On a clean boot, I tried setting /proc/sys/vm/dirty_ratio to 2 and /proc/sys/vm/dirty_background_ratio to 1 and the time was comparable with it at 20 and 10.

Best Answer

Don't know about LUKS specifically, but for general IO issues on an SSD make sure discard is on for your fs mount, i.e. grep discard /proc/mounts also might try (as root) "echo 1 >> /proc/sys/vm/dirty_background_ratio; echo 2 >> /proc/sys/vm/dirty_ratio", this will get the system to initiate IO sooner when there is less of a back log of data to write out.

Related Question