Mysql – How to spec a machine for a particular database load

hardwaremariadbMySQLoptimizationperformance

Our primary authentication database that is used for 100k+ users, and growing is a VM on a shared machine that we want to extract into its own isolated hardware. This will let us isolate resources, and help bring clarity to optimizations. However, I'm not clear on what would be an ideal machine for what we are doing, or how to determine that. So I turn to you for some advice.

We are using MariaDB 10.1.26 (formerly mysql) on a Debian Stretch 64-bit VM with 10GB of memory, and 6 cores allocated to it. The CPU is mostly idle.
There are approximately 320 tables across three databases, no table is very large, each under 100mb.
Size of data on disk is 24GB, with ibdata1 being 2.2GB. Data in InnoDB tables: 1019.8M (Tables: 37).
Reads / Writes: 98% / 2%
Total buffers: 1.9G global + 2.9M per thread (2500 max threads)
Maximum reached memory usage: 4.6G (47.52% of installed RAM)

This machine is replicated to a secondary (and then from the secondary to a tertiary), so relay logs take up a bunch of that space. This server is used for account authentication, it typically is doing SELECTs, and Reads on InnoDB rows so it has very little data, but it needs to be functioning and fast. On
average, there are 230 connected threads, but there are periods where spikes of connections cause us to reach max_connections, and we have been increasing that as we go along, and have been tuning various parameters, but we are maxed out on the available memory we can allocate. There are also situations where some queries are taking longer than they should, this may be a network issue, or when we are reaching max connections, there is nothing that shows up in the slow query log, as they are typical queries that are usually quite fast, but sometimes take longer than they should.

We have zero slow queries (out of 28M queries)
extremely few aborted connections
no sorts requiring temporary tables
and no joins without indexes.

We do have temporary tables created on disk: 42% (1K on disk / 3K
total), which is a bit high, and I'm unsure why that is
happening.

Our thread cache hit rate: 99% (735 created / 913K connections)
table cache hit rate: 97% (253 open / 259 opened).
open file limit used: 0% (50/16K)
99% of our table locks are acquired immediately (out of 29M).

We are using InnoDB for all tables. Some InnoDB stats:

InnoDB Thread Concurrency: 0
InnoDB File per table is activated
InnoDB buffer pool / data size: 1.2G/1019.8M
InnoDB buffer pool instances: 1
InnoDB Read buffer efficiency: 99.98% (240354196 hits/ 240398682 total)
InnoDB Write Log efficiency: 75.01% (302451 hits/ 403237 total)
InnoDB log waits: 0.00% (0 waits / 100786 writes)

I've got munin graphs for the last year of the following, which I'm happy to provide on request. There are too many of them to individually provide: binary/relay log usage; different commands/second; threads and connection counts; table cache, open files, open tables; handler activity (write, update, delete, etc); InnoDB Buffer Pool size, pages and modified pages; InnoDB Buffer Pool activity (pages read, created, and written); InnoDB Checkpoint Age; InnoDB History List length; Number and size InnoDB Insert Buffers; InnoDB IO (file reads/writes, log writes and file syncs); InnoDB IO Pending; InnoDB Log buffer size, KB flushed and Written; InnoDB Row Operations; InnoDB Semaphores; InnoDB Transactions. I also have types of sorts; number of table locks; temporary disk tables; and number of threads.

tps = 3.342
qps = 270.123

based on the following:

use information_schema;
select VARIABLE_VALUE into @num_queries from GLOBAL_STATUS where VARIABLE_NAME = 'QUESTIONS';
select VARIABLE_VALUE into @uptime from GLOBAL_STATUS where VARIABLE_NAME = 'UPTIME';
select VARIABLE_VALUE into @num_com from GLOBAL_STATUS where VARIABLE_NAME = 'COM_COMMIT';
select VARIABLE_VALUE into @num_roll from GLOBAL_STATUS where VARIABLE_NAME = 'COM_ROLLBACK';
select (@num_com + @num_roll) / @uptime as tps, @num_queries / @uptime as qps;

SHOW VARIABLES
SHOW GLOBAL VARIABLES
SHOW GLOBAL STATUS
iostat -x output
ulimit -a:
core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 39926 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 39926 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
df -h:
Filesystem Size Used Avail Use% Mounted on udev 4.9G 0 4.9G 0% /dev tmpfs 1001M 452K 1000M 1% /run /dev/vda1 45G 28G 15G 65% / tmpfs 4.9G 0 4.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 4.9G 0 4.9G 0% /sys/fs/cgroup tmpfs 1001M 0 1001M 0% /run/user/0
free -h:
total used free shared buff/cache available Mem: 9.8G 2.8G 4.1G 452K 2.8G 6.7G Swap: 0B 0B 0B

Best Answer

Most tables are MyISAM? That may be holding you back when it comes to performance. Conflicting info: "320 tables"; "Data in InnoDB tables: 1019.8M (Tables: 37)."; "We are using InnoDB for all tables"

"Maximum reached memory usage: 4.6G", "10G of RAM" -- Do you have other applications running in the same VM? If not you may be under-utilizing RAM. Or maybe not: "InnoDB buffer pool / data size: 1.2G/1019.8M"

"secondary to a tertiary" -- Why? There are valid reasons for such, but if the secondary dies, it leaves the tertiary useless.

"spikes of connections cause us to reach max_connections" -- This could be a serious problem. Is it caused by table locks in MyISAM? Slow queries? DDL operations (ALTER, etc)? There are cases where throttling the connections at the client (eg webserver) is better than increasing max_connections. I need to get a better feel for what is going on.

During a spike, does Threads_running go up? When that STATUS is more than, say, 20, MySQL is likely stumbling over itself. Latency suffers, and throughput stagnates, or even declines.

"We have zero slow queries". I'll bet long_query_time is still at the useless default of 10 seconds. Change it to 1.

"temporary tables created on disk: 42%" -- bad. But let's approach it by finding the 'worst' queries via the slowlog.

Please follow the slowlog suggestions in http://mysql.rjweb.org/doc.php/mysql_analysis

Graphs: Most will be boring. But spikes might be interesting. Especially lining up the graphs that show abnormal activity at the same time as your "spike of connections". (It might suffice to list the metrics an not actually provide the graphs.)

270 qps & 3 tps -- modest. (One survey show 100 qps to be about the median; 1900 to be the 90th percentile.)

You apparently have few tables since you have not slammed into open_file_limit = 1024.

"You can't tune your way out of a performance problem" (and most of the tunables are OK). So, I think the Slowlog (see link above) is the best next step.

Analysis of GLOBAL STATUS and VARIABLES

Observations:

Version: 10.1.26-MariaDB-0+deb9u1
10 GB of RAM
Uptime = 1d 05:53:01
You are not running on Windows.
Running 64-bit version
You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

The values here say you are running mostly InnoDB.

key_buffer_size = 50M
innodb_buffer_pool_size = 4G

Innodb rows deleted is amazingly close to rows inserted -- what kind of processing are you doing?

Spurious COMMITs? (Com_insert + Com_update + Com_delete + Com_replace) < Com_commit

Change slowlog parameters.

20 admin commands per second -- What is going on?

Details and other observations:

( (key_buffer_size - 1.2 * Key_blocks_used * 1024) / _ram ) = (512M - 1.2 * 1449 * 1024) / 10240M = 5.0% -- Percent of RAM wasted in key_buffer. -- Decrease key_buffer_size.

( Key_blocks_used * 1024 / key_buffer_size ) = 1,449 * 1024 / 512M = 0.28% -- Percent of key_buffer used. High-water-mark. -- Lower key_buffer_size to avoid unnecessary memory usage.

( innodb_buffer_pool_size / _ram ) = 1250M / 10240M = 12.2% -- % of RAM used for InnoDB buffer_pool

( (key_buffer_size / 0.20 + innodb_buffer_pool_size / 0.70) / _ram ) = (512M / 0.20 + 1250M / 0.70) / 10240M = 42.4% -- Most of available ram should be made available for caching. -- http://mysql.rjweb.org/doc.php/memory

( innodb_buffer_pool_size ) = 1250M -- InnoDB Data + Index cache -- 128M (an old default) is woefully small.

( Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_total ) = 21,062 / 79999 = 26.3% -- Pct of buffer_pool currently not in use -- innodb_buffer_pool_size is bigger than necessary?

( Innodb_pages_written / Innodb_buffer_pool_write_requests ) = 855,522 / 3597243 = 23.8% -- Write requests that had to hit disk -- Check innodb_buffer_pool_size

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 252,449,792 / (107581 / 3600) / 2 / 156M = 0.0258 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 107,581 / 60 * 156M / 252449792 = 1,161 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size. (Cannot change in AWS.)

( default_tmp_storage_engine ) = default_tmp_storage_engine =

( Innodb_rows_deleted / Innodb_rows_inserted ) = 50,443 / 50359 = 1 -- Churn -- "Don't queue it, just do it." (If MySQL is being used as a queue.)

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( max_connections ) = 2,500 -- Maximum number of connections (threads). Impacts various allocations. -- If max_connections is too high and various memory settings are high, you could run out of RAM.

( innodb_buffer_pool_populate ) = OFF = 0 -- NUMA control

( query_alloc_block_size / _ram ) = 16,384 / 10240M = 0.00% -- For parsing. Pct of RAM

( local_infile ) = local_infile = ON -- local_infile = ON is a potential security issue

( Created_tmp_disk_tables / Created_tmp_tables ) = 1,400 / 3347 = 41.8% -- Percent of temp tables that spilled to disk -- Maybe increase tmp_table_size and max_heap_table_size; improve indexes; avoid blobs, etc.

( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (54024 + 289659 + 6707 + 600) / 360402 = 0.974 -- Statements per Commit (assuming all InnoDB) -- Low: Might help to group queries together in transactions; High: long transactions strain various things.

( binlog_format ) = binlog_format = STATEMENT -- STATEMENT/ROW/MIXED. ROW is preferred; it may become the default.

( expire_logs_days ) = 0 -- How soon to automatically purge binlog (after this many days) -- Too large (or zero) = consumes disk space; too small = need to respond quickly to network/machine crash. (Not relevant if log_bin = OFF)

( slow_query_log ) = slow_query_log = OFF -- Whether to log slow queries. (5.1.12)

( long_query_time ) = 5 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( Connections ) = 922,332 / 107581 = 8.6 /sec -- Connections -- Increase wait_timeout; use pooling?

( thread_cache_size ) = 256 -- How many extra processes to keep around (Not relevant when using thread pooling) (Autosized as of 5.6.8; based on max_connections) 0 is inefficient for non-Windows; 10 is probably fine. More than 100 may lead to OOM.

Abnormally small:

Handler_read_next / Handler_read_key = 0.557
Handler_read_rnd_deleted = 0
Handler_tmp_write = 6 /sec
Innodb_buffer_pool_pages_made_young = 0.57 /HR
Innodb_secondary_index_triggered_cluster_reads = 10.2MB
Rows_tmp_read = 0.8M
interactive_timeout = 300
max_heap_table_size = 1MB
max_tmp_tables * tmp_table_size / _ram = 0.31%
min(max_heap_table_size, tmp_table_size) = 1MB

Abnormally large:

Acl_column_grants = 27
Acl_function_grants = 2
Acl_procedure_grants = 5
Acl_table_grants = 37
Com_admin_commands = 20 /sec
Com_drop_db = 0.1 /HR
Com_show_binlogs = 12 /HR
Com_show_slave_hosts = 0.033 /HR
Handler_discover = 1.7 /HR
Innodb_read_views_memory = 23,864
Max_used_connections = 691
Slave_connections = 2
Slaves_connected = 0.033 /HR
Threads_cached = 150
back_log = 550
host_cache_size = 728
innodb_defragment_fill_factor = 0.9
innodb_lru_scan_depth / innodb_io_capacity = 5.12
max_relay_log_size = 1024MB

Abnormal strings:

Slave_heartbeat_period = 1800
innodb_fast_shutdown = 1
myisam_stats_method = NULLS_UNEQUAL
opt_s__engine_condition_pushdown = off

Related Solutions

Mysql – How much memory do I need for innodb buffer pool

If you go strictly by that rule of accommodating an addition 10%, here is my suggestion:

SELECT CONCAT(CEILING(RIBPS/POWER(1024,pw)),SUBSTR(' KMGT',pw+1,1))
Recommended_InnoDB_Buffer_Pool_Size FROM
(
    SELECT RIBPS,FLOOR(LOG(RIBPS)/LOG(1024)) pw
    FROM
    (
        SELECT SUM(data_length+index_length)*1.1*growth RIBPS
        FROM information_schema.tables AAA,
        (SELECT 1 growth) BBB
        WHERE ENGINE='InnoDB'
    ) AA
) A;

This will produce exactly what you need to set innodb_buffer_pool_size in /etc/my.cnf. If you want to account for 25% increase in data and indexes over time, please change (SELECT 1 growth) BBB to (SELECT 1.25 growth) BBB

Recently, I answered another question like this in the DBA StackExchange.

Mysql – How to debug a db memory-leak causing thesql to go before it’s own limits

...even surpassing it's theorically maximum possible allocation.

[OK] Maximum possible memory usage: 7.3G (46% of installed RAM)

There is not actually a way to calculate maximum possible memory usage for MySQL, because there is no cap on the memory it can request from the system.

The calculation done by mysqltuner.pl is only an estimate, based on a formula that doesn't take into account all possible variables, because if all possible variables were taken into account, the answer would always be "infinite." It's unfortunate that it's labeled this way.

Here is my theory on what's contributing to your excessive memory usage:

thread_cache_size       = 128

Given that your max_connections is set to 200, the value of 128 for thread_cache_size seems far too high. Here's what makes me think this might be contributing to your problem:

When a thread is no longer needed, the memory allocated to it is released and returned to the system unless the thread goes back into the thread cache. In that case, the memory remains allocated.

^{http://dev.mysql.com/doc/refman/5.6/en/memory-use.html}

If your workload causes even an occasional client thread to require a large amount of memory, those threads may be holding onto that memory, then going back to the pool and sitting around, continuing to hold on to memory they don't technically "need" any more, on the premise that holding on to the memory is less costly than releasing it if you're likely to need it again.

I think it's worth a try to do the following, after first making a note of how much memory MySQL is using at the moment.

Note how many threads are currently cached:

mysql> show status like 'Threads_cached';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| Threads_cached | 9     |
+----------------+-------+
1 row in set (0.00 sec)

Next, disable the thread cache.

mysql> SET GLOBAL thread_cache_size = 0;

This disables the thread cache, but the cached threads will stay in the pool until they're used one more time. Disconnect from the server, then reconnect and repeat.

mysql> show status like 'Threads_cached';

Continue disconnecting, reconnecting, and checking until the counter reaches 0.

Then, see how much memory MySQL is holding.

You may see a decrease, possibly significant, and then again you may not. I tested this on one of my systems, which had 9 threads in the cache. Once those threads had all been cleared out of the cache, the total memory held by MySQL did decrease... not by much, but it does illustrate that threads in the cache do release at least some memory when they are destroyed.

If you see a significant decrease, you may have found your problem. If you don't, then there's one more thing that needs to happen, and how quickly it can happen depends on your environment.

If the theory holds that the other threads -- the ones currently servicing active client connections -- have significant memory allocated to them, either because of recent work in their current client session or because of work requiring a lot of memory that was done by another connection prior to them languishing in the pool, then you won't see all of the potential reduction in memory consumption until those threads are allowed to die and be destroyed. Presumably your application doesn't hold them forever, but how long it will take to know for sure whether there's a difference will depend on whether you have the option of cycling your application (dropping and reconnecting the client threads) or if you'll have to just wait for them to be dropped and reconnected over time on their own.

But... it seems like a worthwhile test. You should not see a substantial performance penalty by setting thread_cache_size to 0. Fortunately, thread_cache_size is a dynamic variable, so you can freely change it with the server running.

Best Answer

Related Solutions

Mysql – How much memory do I need for innodb buffer pool

Mysql – How to debug a db memory-leak causing thesql to go before it’s own limits

Related Question