MySQL 8.0 Tuning Advice Needed – Slow Locking & Various Issues

MySQLmysql-8.0performance-tuning

I recently upgraded my database to MySQL 8.0 and I'm using phpmyadmin to get status information. I also upgraded my virtual server to 4GB RAM and 2 vCPUs, intended to be a MySQL-only server for my websites. MySQL runs by itself on the server, I have php and everything else on a separate server.

Problems: The memory usage on my server seems to creep up over time. It usually runs fine, but will crash after a few days or so due to OOM killer. These can be ugly, sometimes MySQL won't restart for hours (locked up / frozen / etc), even if my cronjob script checks every 5 minutes to see if MySQL is running and to restart of it's not running. My websites sometimes end up being down all night/morning until I wake up, and I'll be forced to reboot the OS a few times before things start working again.

There are also issues with slowdowns on the websites that seem to happen without warning, without any apparent reason, nothing in the slow-query log, slow website traffic, with plenty of memory available. These happen for about an hour and the problem goes away on it's own. When it happens, it can take 20-30 seconds for a web page to load, because of an issue with MySQL.

I've investigated the slow-query logs and queries that run without using indexes. After investigating, it turns out many of these involve small tables with 200 rows for countries/states, where we select the most of the table and display it on the website, as designed (which is why many of them show up in the "queries that don't use indexes") list. Otherwise, there isn't much else in that part of the log aside from selecting lots of stuff from small tables.

Here is some of the data from phpmyadmin (updated May 21):

Network traffic since startup: 165.1 GiB
This MySQL server has been running for 3 days, 15 hours, 58 minutes and 2 seconds. It started up on May 18, 2021 at 05:43 AM.

Traffic     #   ø per hour
Received    5.3 GiB 61.4 MiB
Sent    159.8 GiB   1.8 GiB
Total   165.1 GiB   1.9 GiB
Connections #   ø per hour  %
Max. concurrent connections 32  --- ---
Failed attempts 25  0.28    <0.01%
Aborted 0   0   0%
Total   2,494 k 28.35 k 100.00%

The alerted status variables (marked in red by phpmyadmin as something wrong) – Updated May 21:

Aborted connectsDocumentation   25  The number of failed attempts to connect to the MySQL server.
Binlog cache disk useDocumentation  19.5 k  The number of transactions that used the temporary binary log cache but that exceeded the value of binlog_cache_size and used a temporary file to store statements from the transaction.
Handler read rndDocumentation   70.2 M  The number of requests to read a row based on a fixed position. This is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan whole tables or you have joins that don't use keys properly.
Handler read rnd nextDocumentation  5.9 G   The number of requests to read the next row in the data file. This is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have.
Innodb buffer pool pages dirtyDocumentation 20  The number of pages currently dirty.
Innodb buffer pool readsDocumentation   6.8 M   The number of logical reads 
that InnoDB could not satisfy from buffer pool and had to do a single-page read.
Innodb buffer pool wait freeDocumentation   3   Normally, writes to the InnoDB buffer pool happen in the background. However, if it's necessary to read or create a page and no clean pages are available, it's necessary to wait for pages to be flushed first. This counter counts instances of these waits. If the buffer pool size was set properly, this value should be small.
Innodb row lock time avgDocumentation   911 The average time to acquire a row lock, in milliseconds.
Innodb row lock time maxDocumentation   31.9 k  The maximum time to acquire a row lock, in milliseconds.
Innodb row lock waitsDocumentation  228 The number of times a row lock had to be waited for.
Opened tablesDocumentation  7.9 k   The number of tables that have been opened. If opened tables is big, your table cache value is probably too small.
Select full joinDocumentation   203 k   The number of joins that do not use indexes. If this value is not 0, you should carefully check the indexes of your tables.
Slow queriesDocumentation   43  The number of queries that have taken more than long_query_time seconds.Documentation
Sort merge passesDocumentation  4.3 k   The number of merge passes the sort algorithm has had to do. If this value is large, you should consider increasing the value of the sort_buffer_size system variable.
Table locks waitedDocumentation 1.5 k   The number of times that a table lock could not be acquired immediately and a wait was needed. If this is high, and you have performance problems, you should first optimize your queries, and then either split your table or tables or use replication.

Query Statistics (Updated May 21):

Questions since startup: 35,646,301 Documentation
ø per hour: 405,017
ø per minute: 6,750
ø per second: 113
Statements  #   ø per hour  %
select  33,914 k    385.3 k 95.14
update  568 k   6,448.2 1.59
insert  349 k   3,968.7 0.98
change db   337 k   3,826.6 0.94
set option  303 k   3,447.3 0.85
replace 136 k   1,545.2 0.38
delete  14,064  159.8   0.04
update multi    4,827   54.8    0.01
show fields 2,940   33.4    0.01
truncate    2,163   24.6    0.01
show status 2,092   23.8    0.01
show replica status 2,092   23.8    0.01
show slave status   2,092   23.8    0.01
show master status  2,091   23.8    0.01
show processlist    2,047   23.3    0.01
show create table   1,059   12  <0.01
show table status   979 11.1    <0.01
rollback to savepoint   957 10.9    <0.01
show triggers   957 10.9    <0.01
show keys   335 3.8 <0.01
show variables  272 3.1 <0.01
show tables 119 1.4 <0.01
create table    64  0.7 <0.01
show warnings   61  0.7 <0.01
insert select   37  0.4 <0.01
drop table  30  0.3 <0.01
delete multi    26  0.3 <0.01
unlock tables   15  0.2 <0.01
begin   15  0.2 <0.01
show create db  15  0.2 <0.01
savepoint   15  0.2 <0.01
show create trigger 12  0.1 <0.01
release savepoint   12  0.1 <0.01
show grants 8   0.1 <0.01
show binlogs    8   0.1 <0.01
show databases  5   0.1 <0.01
kill    4   <0.1    <0.01
show storage engines    2   <0.1    <0.01
show slave hosts    1   <0.1    <0.01
show replicas   1   <0.1    <0.01
flush   1   <0.1    <0.01
create db   1   <0.1    <0.01

MySQL config file my.cnf. A few years ago, a professional DBA told me to set these variables to tune mysql for a 1GB server – to deal with Out Of Memory crashing. The only variable I recently changed was the innodb_buffer_pool_size from 512MB to 2G (Updated May 21, added "skip-name-resolve" to fix an error I found on mysqltuner):

[mysqld]
skip-name-resolve
default_authentication_plugin = mysql_native_password
character_set_server=latin1
collation_server=latin1_swedish_ci
port = 3306
sql_mode = "NO_ENGINE_SUBSTITUTION"
innodb_buffer_pool_size = 2000M
innodb_strict_mode = OFF
join_buffer_size = 1M
key_buffer_size = 64M
max_connect_errors = 10000
myisam_recover_options = "BACKUP,FORCE"
performance_schema = 0
read_buffer_size = 1M
slow_query_log = ON
sort_buffer_size = 1M
sync_binlog = 0
thread_stack = 262144
wait_timeout = 14400
table_open_cache = 10000
table_definition_cache = 2500
open_files_limit = 30000
max_connections = 100
read_rnd_buffer_size = 128K
innodb_change_buffer_max_size = 15
innodb_log_buffer_size = 12M
innodb_log_file_size = 120M
innodb_buffer_pool_instances = 8
innodb_lru_scan_depth = 128
innodb_page_cleaners = 64
thread_cache_size = 50
max_heap_table_size=24M
tmp_table_size=24M
thread_cache_size=100
innodb_io_capacity=800
read_buffer_size=128K
read_rnd_buffer_size=64K
eq_range_index_dive_limit=32
symbolic-links=0
key_cache_age_threshold=64800
key_cache_division_limit=50
key_cache_block_size=32K
innodb_buffer_pool_dump_pct=90
innodb_print_all_deadlocks=ON
innodb_read_ahead_threshold=8
innodb_read_io_threads=64
innodb_write_io_threads=64
max_allowed_packet=32M
max_seeks_for_key=32
max_write_lock_count=16
myisam_repair_threads=4
open_files_limit=30000
query_alloc_block_size=32K
query_prealloc_size=32K
sort_buffer_size=2M
updatable_views_with_limit=NO
general_log_file=/var/log/mysql/general.log
slow_query_log_file=/var/log/mysql/slow-query.log

UPDATE

I've decided to phase all MyISAM tables to Innodb in all of my databases. There are no more MyISAM tables anymore. Hopefully this can simplify the tuning job.

UPDATE 2: Pastebins

3 days ago, I phased out all MyISAM tables and made them InnoDB tables, added "skip-name-resolve" to my.cnf and restarted the server. I've updated the rest of the info above from phpmyadmin, current as of May 21 and added these pastebins today as well for the new data after having the server run for 2-3 days.

SHOW GLOBAL STATUS: https://pastebin.com/r3t84pvZ

SHOW GLOBAL VARIABLES: https://pastebin.com/kQpevtdx

SHOW FULL PROCESSLIST: https://pastebin.com/fR6b7Tdg

STATUS: https://pastebin.com/vyWyhZSf

MySQL Tuner: https://pastebin.com/ETLCa48V

TOP: https://pastebin.com/cU8RvgpT

ulimit -a: https://pastebin.com/BhNVgEXH

iostat -xm 5 3: https://pastebin.com/MxymEXyq

/proc/meminfo: https://pastebin.com/PKKeumyt

Best Answer

Rate Per Second = RPS

Suggestions to consider in your my.cnf [mysqld] section, to help avoid OOM Killer.

innodb_parallel_read_threads=0  # from 4 because you only have 2 CPU's today
open_files_limit=35000  # from 40000 to allow OS to use other 5,000 reported by ulimit -a
temptable_max_mmap=32M  # from 1G since you only have 4G total
temptable_max_ram=32M  # from 1G because you only have 4G total
key_buffer_size=8M  # from 64M since most tables are no longer MyISAM
innodb_adaptive_max_sleep_delay=2000  # from 15000 for 2 second delay when BUSY
innodb_buffer_pool_size=2560M  # from 2G to reduce innodb_buffer_pool_reads RPS of 22
innodb_change_buffer_max_size=10  # from 15 percent for change management
net_buffer_length=64K  # from 16K to support avg bytes_sent per connection of 68,932
log_error=/var/log/mysql/error.log  # from stderr which is NOT a best practice

Other details to consider for your instance,

OS SWAP SIZE of 11G rather than 0 SWAP to be prepared for 8G RAM upgrade - desperately needed for your activity level. When budget is available and 6 CPU's would be beneficial.

select_scan RPS of 10 indicates indexes are missing. Slow Query Log will have them listed. select_full_join RPS of 1 indicates indexes are missing. Use of log_queries_not_using_indexes will record them for you in your Slow Query Log.

There are more opportunities to improve performance.

Tuning quarterly has been proven to be helpful in identifying where the bottlnecks are after your best efforts this quarter.

Use of system features change over time and the target moves. We are here to help.