MySQL 5.7 + InnoDB: thesqld is CPU bound, with ever increasing memory usage

mysql-5.7

[ADDED ANSWER 2019-10-08] I've posted an answer to this question. tl;dr: Force MySQL to use jemalloc or tcmalloc.


I recently upgraded from 5.5.25-rc (very old install), through 5.5.62, then 5.7.27. At the same time, I made the decision to migrate most (possibly all) tables from MyISAM to InnoDB.

When mysqld is first started, everything runs as expected. It's a busy server with almost constant inserts. But over time:

  • The mysqld process seems to hit a limit, consistently consuming around 100% CPU (there's four physical cores, so 100% suggests a fully loaded single thread).

  • Disk activity falls from near 100% to near nil, despite pending inserts continuing to pile up. One insert was manually killed by me at around 40 minutes; it's now 44 hours (nearly 2 days) later, and despite being killed it still appears in SHOW PROCESSLIST, with other processes waiting for it to release the lock.

  • The memory usage periodically increases, well beyond the value that I would expect MySQL to use, until memory is exhausted and the OS kills the process.

To give an example of the latter, here's selected fields from top at various points: time since starting mysqld, allocated memory, resident memory, CPU:

10 mins:    2827M   1131M   100.39%
1 hour:     2903M   1421M   100.20%
12 hours:   3696M   3109M   111.18%
24 hours:   3846M   3435M   100.00%
45 hours:   4332M   3699M   100.00%   (now)

Note the difference in allocated memory between 10 minutes and 45 hours is an additional 1505M.

[ADDED 2019-10-04] After upgrading the mainboard/RAM/CPU, the behaviour is the same, just with bigger numbers:

10 mins: 9530 mysql        47  20    0    14G  5866M select  11  10:17 117.09% mysql
1 hour:  9530 mysql        46  20    0    15G    14G select   4  42:46  39.50% mysqld
4 hours: 9530 mysql        45  20    0    22G    21G select   9 122:32  25.78% mysqld
7 hours: 9530 mysql        45  20    0    27G    24G select   2 216:44  24.37% mysqld
<process killed by OS>

With the additional available RAM I configured the pool size to 12000M (~12GB), so the MySQL process eventually using 27GB of RAM (more than double the configured pool size) is still an unexpected level of overhead.

Things I've tried to diagnose the issue:

  • Restarting mysqld (plus also trying a clean reboot of the server). Things return to normal for a while, but the long term behaviour does not change.

  • Various MySQL memory estimate tools and commands. Some of these are quite old, some do not support FreeBSD, and others do not specify a particular MySQL version or series, so it's difficult to interpret the output. Actual memory usage far exceeds all of the successful estimates.

  • Enabling performance/sys schema to track memory usage. The changes in total reported memory usage over time are minor (+/- a couple of hundred MB), and do not explain why the actual memory allocated to MySQL continues to grow.

  • Drive speed tests, SMART self-tests, ZFS scrub. No signs of any storage problems.

At this point I'm completely baffled. I cannot find anyone else who has this problem.

This system is fairly old, but it showed no signs of issues until the recent MySQL upgrade. I do have a new mainboard and RAM on order, and I suspect that the extra RAM (32GB vs 8GB) will make a big difference, but I'm concerned it will just mask whatever this problem is. (New mainboard has been installed, and excessive memory consumption has not changed.)

Thank you in advance for any advice.

ADDITIONAL INFORMATION

System config:

  • Q8400 (I think) CPU Now Xeon E5-2620
  • 8GB RAM Now 32GB ECC RAM
  • MySQL 5.7.27, installed via FreeBSD package manager system
  • FreeBSD 12.0
  • 2 x 2GB 7200RPM HDDs in ZFS mirror: recordsize=16k, atime=off, compression=lz4. Now with SLOG on dedicated 16GB NVMe M.2 SSD.

[ADDED 2019-10-05]

# su mysql
$ ulimit -a
cpu time               (seconds, -t)  unlimited
file size           (512-blocks, -f)  unlimited
data seg size           (kbytes, -d)  33554432
stack size              (kbytes, -s)  524288
core file size      (512-blocks, -c)  unlimited
max memory size         (kbytes, -m)  unlimited
locked memory           (kbytes, -l)  unlimited
max user processes              (-u)  34232
open files                      (-n)  940113
virtual mem size        (kbytes, -v)  unlimited
swap limit              (kbytes, -w)  unlimited
socket buffer size       (bytes, -b)  unlimited
pseudo-terminals                (-p)  unlimited
kqueues                         (-k)  unlimited
umtx shared locks               (-o)  unlimited

my.cnf : https://pastebin.com/q6WQ55bL

Dump of (1) top (memory usage) (2) SHOW GLOBAL STATUS (3) SHOW GLOBAL VARIABLES : https://pastebin.com/DyqUpbyG (queried every 5 mins; this is the last successful dump before server process was killed.)

Best Answer

See this bug report, where others appear to have the same issue:

https://bugs.mysql.com/bug.php?id=83047

Note that the reports here mention Linux, but I use FreeBSD, so it does not appear to be an OS-specific issue.

The suggested workaround is to try a third party malloc implementation, such as jemalloc, or tcmalloc.

FreeBSD already uses jemalloc as its default allocator (although I was unable to confirm 100% that the mysqld executable I have was actually linked against it), so I tried installing and configuring tcmalloc. (See https://github.com/gperftools/gperftools for source; your OS may have it available as a package.)

[mysqld_safe]
malloc-lib=/usr/lib/libtcmalloc_minimal.so

An hour after restart, allocated memory for the mysqld process was at 5954M.

Now, 31 hours after restart, memory is at 6037M, which is an increase of just over 1%. In addition, that value has not changed for the past 15 hours, which suggests that the mysqld process is in a state where all future allocations can be satisfied internally.

It's alarming that somewhere between 5.6 and 5.7, something in MySQL changed that causes the default malloc implementation to go off the charts, and a third party library is required to fix. I cannot find any official acknowledgement of this bug.

Hope this answer helps anyone else experiencing this very frustrating problem.