MySQL (Percona): Multiple simultaneous long-running connections

MySQLperconapercona-server

I'm running a distributed app in the cloud where clients keep open multiple long-running connections to a MySQL (Percona) server.

After updating to the latest Percona release, the maximum number of simultaneously connected clients has dropped significantly. In the past, with the previous Percona version, it was used successfully with up to 8K connections, now it struggles to get above 3K.

By "struggles" I mean that when the connection bottleneck has been reached, connecting to MySQL times out even in the command line. When I do manage to connect, show processlist doesn't show any hanging queries or locks. So the connections that are established work just fine. The Apache webserver serving the PHP webapp used to monitor the project also hangs indefinitely.

Here's the my.cnf I use, patched together from various guides.

[mysqld]
open_files_limit               = 16384
table_open_cache               = 16384
character_set_server           = utf8mb4
max_connections                = 16384
expire_logs_days               = 10
max_binlog_size                = 100M
innodb_open_files              = 16384
innodb_file_per_table          = 1
innodb_read_io_threads         = 4
innodb_write_io_threads        = 4
innodb_thread_concurrency      = 0
innodb_log_file_size           = 128M
innodb_open_files              = 4000
innodb_flush_method            = O_DIRECT
innodb_buffer_pool_instances   = 1
thread_pool_size               = 16
local_infile                   = 1
skip-name-resolve
thread_cache_size              = 16384
thread_handling                = pool-of-threads
innodb_buffer_pool_size        = 512M
innodb_buffer_pool_instances   = 1
innodb_log_buffer_size         = 64M

And the Percona version I'm running:

mysqld --version
Ver 8.0.15-6 for debian-linux-gnu on x86_64 (Percona Server (GPL), Release '6', Revision '63abd08')

OS: Ubuntu 18.04.2 LTS

Hardware:

CPU: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
RAM: 32G

The queries I'm running are all based on unique indexes (no joins), tables are all InnoDB. The same configuration worked fine until the update to Percona 8.

The my.cnf settings I had to remove after the upgrade as they were no longer supported are:

innodb_locks_unsafe_for_binlog = 0
query_cache_size = 0
query_cache_type = 0

Things I tried:

I ran mysqltuner.pl but didn't get any relevant advice from it.
ulimit -n and ulimit -s show values significantly higher than what I'm using (1048576 and 16384 specifically).

Any other suggestions?

Best Answer

There have been only 92 Connections in the 5 hours of Uptime, and no more than 4 at once (Max_used_connections). So I don't understand your question about struggling with 3K connections.

Observations:

Version: 8.0.15-6
32 GB of RAM
Uptime = 04:54:20; some GLOBAL STATUS values may not be meaningful yet.
Are you sure this was a SHOW GLOBAL STATUS ?
You are not running on Windows.
Running 64-bit version
You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

Extremely high disk activity on the buffer_pool. Since you have 32GB, and assuming it is mostly for MySQL, change innodb_buffer_pool_size to 22G.

Why so many CHECK, SET, DELETE, SHOW COLUMNS statements?

You seem to have some really big queries. Turn on the slowlog to catch the worst of them. For example, maybe a DELETE is missing a useful index?

The table cache is not very effective, increase table_open_cache to 2000.

Max_connections = 15574 is quite high. Especially since the most used is only 4. Drop it to 100.

Details and other observations:

( Innodb_buffer_pool_reads ) = 28,825,842 / 17660 = 1632 /sec -- InnoDB buffer_pool I/O read rate -- check innodb_buffer_pool_size

( Innodb_buffer_pool_pages_flushed ) = 61,160,824 / 17660 = 3463 /sec -- Writes (flushes) -- check innodb_buffer_pool_size

( innodb_buffer_pool_size / _ram ) = 512M / 32768M = 1.6% -- % of RAM used for InnoDB buffer_pool

( Table_open_cache_misses / (Table_open_cache_hits + Table_open_cache_misses) ) = 979 / (8530 + 979) = 10.3% -- Effectiveness of table_open_cache. -- Increase table_open_cache and check table_open_cache_instances.

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests ) = 28,825,842 / 997216830 = 2.9% -- Read requests that had to hit disk -- Increase innodb_buffer_pool_size if you have enough RAM.

( Innodb_pages_read / Innodb_buffer_pool_read_requests ) = 50,177,127 / 997216830 = 5.0% -- Read requests that had to hit disk -- Increase innodb_buffer_pool_size if you have enough RAM.

( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((28825842 + 61160824) ) / 17660 = 5095 /sec -- InnoDB I/O -- Increase innodb_buffer_pool_size?

( innodb_log_buffer_size / innodb_log_file_size ) = 64M / 48M = 133.3% -- Buffer is in RAM; file is on disk. -- The buffer_size should be smaller and/or the file_size should be larger.

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 248,832 / (17660 / 3600) / 2 / 48M = 0.0005 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 17,660 / 60 * 48M / 248832 = 59,535 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size. (Cannot change in AWS.)

( Innodb_rows_deleted / Innodb_rows_inserted ) = 201 / 0 = INF -- Churn -- "Don't queue it, just do it." (If MySQL is being used as a queue.)

( ( Innodb_pages_read + Innodb_pages_written ) / Uptime / innodb_io_capacity ) = ( 50177127 + 224 ) / 17660 / 200 = 1420.6% -- If > 100%, need more io_capacity. -- Increase innodb_io_capacity if the drives can handle it.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( expand_fast_index_creation ) = expand_fast_index_creation = OFF -- ALTER and OPTIMIZE may be greatly sped up by using ON. -- Probably better to be ON.

( innodb_thread_concurrency ) = 0 -- 0 = Let InnoDB decide the best for concurrency_tickets. -- Set to 0 or 64. This may cut back on CPU.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( max_connections ) = 15,574 -- Maximum number of connections (threads). Impacts various allocations. -- If max_connections is too high and various memory settings are high, you could run out of RAM.

( (Com_show_create_table + Com_show_fields) / Questions ) = (0 + 436) / 1804 = 24.2% -- Naughty framework -- spending a lot of effort rediscovering the schema. -- Complain to the 3rd party vendor.

( local_infile ) = local_infile = ON -- local_infile = ON is a potential security issue

( Select_scan / Com_select ) = 217 / 199 = 109.0% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( slow_query_log ) = slow_query_log = OFF -- Whether to log slow queries.

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( back_log ) = 15,574 -- (Autosized as of 5.6.6; based on max_connections) -- Raising to min(150, max_connections) may help when doing lots of connections.

( thread_cache_size ) = 16,384 -- How many extra processes to keep around (Not relevant when using thread pooling) (Autosized as of 5.6.8; based on max_connections) -- Greater than 100 may lead to OOM.

( thread_cache_size / Max_used_connections ) = 16,384 / 4 = 409600.0% -- There is no advantage in having the thread cache bigger than your likely number of connections. Wasting space is the disadvantage.

Abnormally small:

Bytes_received = 6.2 /sec
Bytes_sent = 145 /sec
Com_insert = 0
Com_select = 41 /HR
Handler_read_key = 0.45 /sec
Handler_write = 0.33 /sec
Innodb_background_log_sync = 0
Innodb_data_writes = 77 /HR
Innodb_data_written = 222 /sec
Innodb_pages0_read = 0
Innodb_rows_inserted = 0
Innodb_secondary_index_triggered_cluster_reads = 0.32 /sec
Max_used_connections = 4
Open_files = 2
Select_range = 0
Table_locks_immediate = 2.7 /HR
Table_open_cache_hits = 0.48 /sec
innodb_default_encryption_key_id = 0

Abnormally large:

( Innodb_pages_read + Innodb_pages_written ) / Uptime = 2,841
Com_check = 22 /HR
Com_create_db = 0.2 /HR
Handler_read_next / Handler_read_key = 51,277
Innodb_buffer_pool_pages_flushed / max(Questions, Queries) = 33,884
Innodb_buffer_pool_pages_made_not_young = 12596 /sec
Innodb_buffer_pool_read_ahead = 1209 /sec
Innodb_data_pending_reads = 0.82 /HR
Innodb_data_read = 41774567 /sec
Innodb_data_reads = 2841 /sec
Innodb_pages_read = 2841 /sec
Threadpool_idle_threads = 9
Threadpool_threads = 11
gtid_executed_compression_period = 0.057 /sec
host_cache_size = 1,381
innodb_max_dirty_pages_pct_lwm = 10
innodb_undo_tablespaces = 2
max_error_count = 1,024
max_length_for_sort_data = 4,096
slave_pending_jobs_size_max = 128MB

Abnormal strings:

bind_address = 0.0.0.0
default_authentication_plugin = caching_sha2_password
event_scheduler = ON
have_ssl = YES
have_symlink = DISABLED
innodb_fast_shutdown = 1
innodb_undo_log_truncate = ON
optimizer_trace = enabled=off,one_line=off
slave_rows_search_algorithms = INDEX_SCAN,HASH_SCAN
thread_handling = pool-of-threads
transaction_write_set_extraction = XXHASH64

Related Solutions

Mysql – How to identify InnoDB table corruption

Morgan gives a hint in his comment that InnoDB is constantly checking for corrupted pages by doing checksums on the pages it reads. If InnoDB finds a checksum mismatch, it will ~~crash~~ stop the server.

If you want to speed that process up (instead of waiting for InnoDB to read the corrupted page), you can use innochecksum:

Because checksum mismatches will cause InnoDB to deliberately shut down a running server, it can be preferable to use this tool rather than waiting for a server in production usage to encounter the damaged pages.

An interesting caveat:

innochecksum cannot be used on tablespace files that the server already has open. For such files, you should use CHECK TABLE to check tables within the tablespace.

So yes, for an online table CHECK TABLE is probably the tool (or as pointed out in another answer mysqlcheck if you want to do more than a single database at a time.)

If you can shut down your database you can force it the checksums using innochecksum

Anecdotal: On a innodb tablespace of 29GB (with innodb_file_per_table=1), this script took about 2 Minutes

#!/bin/bash
for i in $(ls /var/lib/mysql/*/*.ibd)
do
  innochecksum -v $i
done

As a bonus, though, since you are running Percona, they implemented a new method for fast innodb checksum. I've never used it, but it might speed up the process.

Mysql – Capabilities of InnoDB INSERT Performance

You need to tune your InnoDB settings in the following areas:

Make InnoDB access all your cores
Increase innodb_buffer_pool_size to 12G
Increase innodb_buffer_pool_instances to 2 (First run numactl --hardware to determine the number of Physical CPUs. What every number of CPUs it reports, use that number. I learned this recently in Jeremy Cole's Blog)
Increase Log File Size (innodb_log_file_size) to 2047M
support separate tablespace files for individual InnoDB tables (enaled innodb_file_per_table)
support either high performance or high durability (ACID Compliance)
- High Performance : innodb_flush_log_at_trx_commit set to 0 or 2
- High Durability : innodb_flush_log_at_trx_commit set to 1 (Default)
- Increase Size up the innodb_log_buffer_size in conjunction with the number of transactions per second (perhaps 32M)
- Your current setting for innodb_flush_log_at_trx_commit is good
- Your current setting for innodb_flush_method is good
Increase innodb_read_io_threads to 64
Increase innodb_write_io_threads to 64
Increase innodb_io_capactity to 10000

Here are my past posts on tuning the InnoDB storage engine

Is the CPU performance relevant for a database server? (Apr 26, 2012)
Why does InnoDB store all databases in one file? (Mar 25, 2012)
Multi cores and MySQL Performance (Sep 20, 2011)
Possible to make MySQL use more than one core? (Sep 12, 2011)
About single threaded versus multithreaded databases performance (May 26, 2011)
How to safely change MySQL innodb variable 'innodb_log_file_size'? (Feb 16, 2011)
How do you tune MySQL for a heavy InnoDB workload? (Feb 12, 2011)
Howto: Clean a mysql InnoDB storage engine? (October 29, 2010)

Best Answer

Related Solutions

Mysql – How to identify InnoDB table corruption

Mysql – Capabilities of InnoDB INSERT Performance

Related Question