MySQL performance degraded after database migration

innodbMySQLmysql-5.7

I migrated my MySQL database from GCP to Azure (both 5.7), but it seems to have affected performance.

Server before migration: 2 VCPUS with 7.5GB memory
Server after migration: 2 VCPUS with 8GB memory

Both servers run / ran version 5.7 of the MySQL server. My database is currently around 6GB in size, growing 100MB+ a day. It only consists of 32 tables, although a fraction of them tables enter the millions of rows category.

I read up on innodb_buffer_pool_size, GCP apparently sets it to around 80% of the memory, which would make it 6GB. I have set the innodb_buffer_pool_size on the new server to the same value.

Before updating this value (when I first noticed decreased performance), innodb_buffer_pool_size was set to 0.1 GB on the new server, I then decided to update this to the value the GCP server was set at hoping it would help.

Following this documentation I was able to update the buffer pool size.

How did I check the innodb_buffer_pool_size initially?

-- returned 0.111...
SELECT @@innodb_buffer_pool_size/1024/1024/1024;

How did I update innodb_buffer_pool_size?

SET GLOBAL innodb_buffer_pool_size=6442450944;

I checked the resize status with this query,

-- returned 'Completed resizing buffer pool at 200920 13:46:20.'
SHOW STATUS WHERE Variable_name='InnoDB_buffer_pool_resize_status';

I execute around 2 queries a second, peaking at 250k a day spread out. I can't be certain but this usage shouldn't be enough to halt performance?

How am I checking performance?

I have shown a list of queries ran, and the times it takes for the server to respond. I have tested these queries in Navicat, Datagrip, and CLI with similar results.

I wasn't sure what queries to include here to give as much information as possible, so if I haven't included anything useful I can update it upon request.

-- Fetching 100k rows from a 3.1m rows table
-- Time took: 21.248s
SELECT * FROM `profile_connections` LIMIT 100000;

-- (SECOND TIME) Fetching 100k rows from a 3.1m rows table
-- Time took: 1.735s
SELECT * FROM `profile_connections` LIMIT 100000;

- Fetching a random row from a 3.1m row table 
-- Time took: 0.857s
SELECT * FROM `profile_connections` WHERE `id` = 2355895 LIMIT 1;

-- (SECOND TIME) Fetching a random row from a 3.1m row table 
-- Time took: 0.850s
SELECT * FROM `profile_connections` WHERE `id` = 2355895 LIMIT 1;

-- Fetching all rows from a 20 row table
-- Time took: 40.010s
SELECT * FROM `profile_types`

-- (SECOND) Fetching all rows from a 20 row table
-- Time took: 0.850s
SELECT * FROM `profile_types`

But at times, I can run all of the above queries and get a response in 2 – 5 seconds. Performance seems to be hit or miss, there are huge differences in times taken for the same query, depending on when it is run which I am currently struggling to diagnose.

I ran mysqltuner and got these performance metrics back:

[--] Up for: 47m 39s (38K q [13.354 qps], 1K conn, TX: 403M, RX: 63M)
[--] Reads / Writes: 50% / 50%
[--] Binary logging is disabled
[--] Physical Memory     : 7.8G
[--] Max MySQL memory    : 146.8G
[--] Other process memory: 0B
[--] Total buffers: 6.0G global + 954.7M per thread (151 max threads)
[--] P_S Max memory usage: 72B
[--] Galera GCache Max memory usage: 0B
[!!] Maximum reached memory usage: 21.9G (281.61% of installed RAM)
[!!] Maximum possible memory usage: 146.8G (1888.34% of installed RAM)
[!!] Overall possible memory usage with other process exceeded memory
[OK] Slow queries: 3% (1K/38K)
[OK] Highest usage of available connections: 11% (17/151)
[OK] Aborted connections: 0.67%  (9/1342)
[!!] name resolution is active : a reverse name resolution is made for each new connection and can reduce performance
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 41 sorts)
[OK] No joins without indexes
[OK] Temporary tables created on disk: 4% (82 on disk / 1K total)
[OK] Thread cache hit rate: 98% (17 created / 1K connections)
[OK] Table cache hit rate: 63% (667 open / 1K opened)
[OK] table_definition_cache(1400) is upper than number of tables(302)
[OK] Open file limit used: 1% (55/5K)
[OK] Table locks acquired immediately: 100% (1K immediate / 1K locks)

Slow query logs
I run a lot of the same queries, so I've truncated it to include just a few.

# Time: 2020-09-20T16:45:04.230173Z
# User@Host: root[root] @  [51.132.38.176]  Id:     7
# Query_time: 1.022011  Lock_time: 0.000084 Rows_sent: 1  Rows_examined: 1058161
SET timestamp=1600620304;
SELECT @id := `id`,`item`
                    FROM `queue_items`
                    WHERE `processed_at` IS NULL AND `completed_at` IS NULL AND `confirmed` = '1'ORDER BY `id` ASC
                    LIMIT 1
                    FOR UPDATE;
# Time: 2020-09-20T16:45:09.676613Z
# User@Host: root[root] @  [51.132.38.176]  Id:     5
# Query_time: 1.198063  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 0
SET timestamp=1600620309;
COMMIT;
# Time: 2020-09-20T16:45:22.938081Z
# User@Host: root[root] @  [51.105.34.135]  Id:     4
# Query_time: 5.426964  Lock_time: 0.000133 Rows_sent: 0  Rows_examined: 1
SET timestamp=1600620322;
UPDATE `queue_items` SET `completed_at` = '2020-09-20 16:45:17', `updated_at` = '2020-09-20 16:45:17' WHERE `id` = 1818617;

Since asking, I have added the following variables to my config, although it doesn't seem to have fixed any of the performance issues.

slow_query_log=1
long_query_time=1
log_output=FILE
slow_query_log_file=/var/lib/mysql/slow.log

innodb_buffer_pool_size=6442450944
innodb_buffer_pool_instances=6
innodb_log_file_size=780MB

query_cache_size=0
query_cache_limit=1M

Best Answer

SET GLOBAL innodb_buffer_pool_size=6442450944;

Lasts only until the server restarts. You must add it to the config file (typically my.cnf) to make it permanent.

Yes, 6G is probably a good value. If you have any swapping, lower it, starting with 5G.

It is quite typical for the second run to be 10x faster than the first. This is because the first had to do a lot of disk reads into the buffer_pool; then the second read could take advantage of the cached data.

Fetching a specific row (that's not a "random" row), took a long time because you don't have an index on id. Every table should have a PRIMARY KEY (which is an index).

40 seconds for a 20 row table -- No way! Even 0.85s is not real. Please provide more details.

The first entry from the slowlog says that a lot of the table needed to be scanned. This composite index will make it run much faster: INDEX(confirmed, completed_at, processed_at, id) -- Possibly 1ms instead of 1s.

Please provide SHOW CREATE TABLE queue_items; .

(For security and other reasons, I recommend you avoid "root" except for admin purposes. Create a login for the app that is limited to the database(s) it needs to touch.)

Did you change any other VARIABLES?

OK, I see it now. That SELECT probably scanned through more data than would fit in the buffer_pool. That bumped out the 20-row table, perhaps repeatedly. Add my suggested index. And, if you have any other slow queries, let's discuss them.

Also, consider purging old data from your "queue". Tips: http://mysql.rjweb.org/doc.php/deletebig