MySQL (Percona): Multiple simultaneous long-running connections

MySQLperconapercona-server

I'm running a distributed app in the cloud where clients keep open multiple long-running connections to a MySQL (Percona) server.

After updating to the latest Percona release, the maximum number of simultaneously connected clients has dropped significantly. In the past, with the previous Percona version, it was used successfully with up to 8K connections, now it struggles to get above 3K.

By "struggles" I mean that when the connection bottleneck has been reached, connecting to MySQL times out even in the command line. When I do manage to connect, show processlist doesn't show any hanging queries or locks. So the connections that are established work just fine. The Apache webserver serving the PHP webapp used to monitor the project also hangs indefinitely.

Here's the my.cnf I use, patched together from various guides.

[mysqld]
open_files_limit               = 16384
table_open_cache               = 16384
character_set_server           = utf8mb4
max_connections                = 16384
expire_logs_days               = 10
max_binlog_size                = 100M
innodb_open_files              = 16384
innodb_file_per_table          = 1
innodb_read_io_threads         = 4
innodb_write_io_threads        = 4
innodb_thread_concurrency      = 0
innodb_log_file_size           = 128M
innodb_open_files              = 4000
innodb_flush_method            = O_DIRECT
innodb_buffer_pool_instances   = 1
thread_pool_size               = 16
local_infile                   = 1
skip-name-resolve
thread_cache_size              = 16384
thread_handling                = pool-of-threads
innodb_buffer_pool_size        = 512M
innodb_buffer_pool_instances   = 1
innodb_log_buffer_size         = 64M

And the Percona version I'm running:

mysqld --version
Ver 8.0.15-6 for debian-linux-gnu on x86_64 (Percona Server (GPL), Release '6', Revision '63abd08')

OS: Ubuntu 18.04.2 LTS

Hardware:

  • CPU: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
  • RAM: 32G

The queries I'm running are all based on unique indexes (no joins), tables are all InnoDB. The same configuration worked fine until the update to Percona 8.

The my.cnf settings I had to remove after the upgrade as they were no longer supported are:

  • innodb_locks_unsafe_for_binlog = 0
  • query_cache_size = 0
  • query_cache_type = 0

Things I tried:

  • I ran mysqltuner.pl but didn't get any relevant advice from it.

  • ulimit -n and ulimit -s show values significantly higher than what I'm using (1048576 and 16384 specifically).

Any other suggestions?

Best Answer

There have been only 92 Connections in the 5 hours of Uptime, and no more than 4 at once (Max_used_connections). So I don't understand your question about struggling with 3K connections.

Observations:

  • Version: 8.0.15-6
  • 32 GB of RAM
  • Uptime = 04:54:20; some GLOBAL STATUS values may not be meaningful yet.
  • Are you sure this was a SHOW GLOBAL STATUS ?
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

Extremely high disk activity on the buffer_pool. Since you have 32GB, and assuming it is mostly for MySQL, change innodb_buffer_pool_size to 22G.

Why so many CHECK, SET, DELETE, SHOW COLUMNS statements?

You seem to have some really big queries. Turn on the slowlog to catch the worst of them. For example, maybe a DELETE is missing a useful index?

The table cache is not very effective, increase table_open_cache to 2000.

Max_connections = 15574 is quite high. Especially since the most used is only 4. Drop it to 100.

Details and other observations:

( Innodb_buffer_pool_reads ) = 28,825,842 / 17660 = 1632 /sec -- InnoDB buffer_pool I/O read rate -- check innodb_buffer_pool_size

( Innodb_buffer_pool_pages_flushed ) = 61,160,824 / 17660 = 3463 /sec -- Writes (flushes) -- check innodb_buffer_pool_size

( innodb_buffer_pool_size / _ram ) = 512M / 32768M = 1.6% -- % of RAM used for InnoDB buffer_pool

( Table_open_cache_misses / (Table_open_cache_hits + Table_open_cache_misses) ) = 979 / (8530 + 979) = 10.3% -- Effectiveness of table_open_cache. -- Increase table_open_cache and check table_open_cache_instances.

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests ) = 28,825,842 / 997216830 = 2.9% -- Read requests that had to hit disk -- Increase innodb_buffer_pool_size if you have enough RAM.

( Innodb_pages_read / Innodb_buffer_pool_read_requests ) = 50,177,127 / 997216830 = 5.0% -- Read requests that had to hit disk -- Increase innodb_buffer_pool_size if you have enough RAM.

( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((28825842 + 61160824) ) / 17660 = 5095 /sec -- InnoDB I/O -- Increase innodb_buffer_pool_size?

( innodb_log_buffer_size / innodb_log_file_size ) = 64M / 48M = 133.3% -- Buffer is in RAM; file is on disk. -- The buffer_size should be smaller and/or the file_size should be larger.

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 248,832 / (17660 / 3600) / 2 / 48M = 0.0005 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 17,660 / 60 * 48M / 248832 = 59,535 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size. (Cannot change in AWS.)

( Innodb_rows_deleted / Innodb_rows_inserted ) = 201 / 0 = INF -- Churn -- "Don't queue it, just do it." (If MySQL is being used as a queue.)

( ( Innodb_pages_read + Innodb_pages_written ) / Uptime / innodb_io_capacity ) = ( 50177127 + 224 ) / 17660 / 200 = 1420.6% -- If > 100%, need more io_capacity. -- Increase innodb_io_capacity if the drives can handle it.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( expand_fast_index_creation ) = expand_fast_index_creation = OFF -- ALTER and OPTIMIZE may be greatly sped up by using ON. -- Probably better to be ON.

( innodb_thread_concurrency ) = 0 -- 0 = Let InnoDB decide the best for concurrency_tickets. -- Set to 0 or 64. This may cut back on CPU.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( max_connections ) = 15,574 -- Maximum number of connections (threads). Impacts various allocations. -- If max_connections is too high and various memory settings are high, you could run out of RAM.

( (Com_show_create_table + Com_show_fields) / Questions ) = (0 + 436) / 1804 = 24.2% -- Naughty framework -- spending a lot of effort rediscovering the schema. -- Complain to the 3rd party vendor.

( local_infile ) = local_infile = ON -- local_infile = ON is a potential security issue

( Select_scan / Com_select ) = 217 / 199 = 109.0% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( slow_query_log ) = slow_query_log = OFF -- Whether to log slow queries.

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( back_log ) = 15,574 -- (Autosized as of 5.6.6; based on max_connections) -- Raising to min(150, max_connections) may help when doing lots of connections.

( thread_cache_size ) = 16,384 -- How many extra processes to keep around (Not relevant when using thread pooling) (Autosized as of 5.6.8; based on max_connections) -- Greater than 100 may lead to OOM.

( thread_cache_size / Max_used_connections ) = 16,384 / 4 = 409600.0% -- There is no advantage in having the thread cache bigger than your likely number of connections. Wasting space is the disadvantage.

Abnormally small:

Bytes_received = 6.2 /sec
Bytes_sent = 145 /sec
Com_insert = 0
Com_select = 41 /HR
Handler_read_key = 0.45 /sec
Handler_write = 0.33 /sec
Innodb_background_log_sync = 0
Innodb_data_writes = 77 /HR
Innodb_data_written = 222 /sec
Innodb_pages0_read = 0
Innodb_rows_inserted = 0
Innodb_secondary_index_triggered_cluster_reads = 0.32 /sec
Max_used_connections = 4
Open_files = 2
Select_range = 0
Table_locks_immediate = 2.7 /HR
Table_open_cache_hits = 0.48 /sec
innodb_default_encryption_key_id = 0

Abnormally large:

( Innodb_pages_read + Innodb_pages_written ) / Uptime = 2,841
Com_check = 22 /HR
Com_create_db = 0.2 /HR
Handler_read_next / Handler_read_key = 51,277
Innodb_buffer_pool_pages_flushed / max(Questions, Queries) = 33,884
Innodb_buffer_pool_pages_made_not_young = 12596 /sec
Innodb_buffer_pool_read_ahead = 1209 /sec
Innodb_data_pending_reads = 0.82 /HR
Innodb_data_read = 41774567 /sec
Innodb_data_reads = 2841 /sec
Innodb_pages_read = 2841 /sec
Threadpool_idle_threads = 9
Threadpool_threads = 11
gtid_executed_compression_period = 0.057 /sec
host_cache_size = 1,381
innodb_max_dirty_pages_pct_lwm = 10
innodb_undo_tablespaces = 2
max_error_count = 1,024
max_length_for_sort_data = 4,096
slave_pending_jobs_size_max = 128MB

Abnormal strings:

bind_address = 0.0.0.0
default_authentication_plugin = caching_sha2_password
event_scheduler = ON
have_ssl = YES
have_symlink = DISABLED
innodb_fast_shutdown = 1
innodb_undo_log_truncate = ON
optimizer_trace = enabled=off,one_line=off
slave_rows_search_algorithms = INDEX_SCAN,HASH_SCAN
thread_handling = pool-of-threads
transaction_write_set_extraction = XXHASH64