A production database has been exported to a separate location for various reasons, however the dumped/imported database is performing much slower on some very specific queries. Here is 2 queries on production:
mysql> SELECT count(id) FROM payments;
+-----------+
| count(id) |
+-----------+
| 964390 |
+-----------+
1 row in set (0.14 sec)
mysql> SELECT count(id) FROM payments WHERE is_held = True;
+-----------+
| count(id) |
+-----------+
| 0 |
+-----------+
1 row in set (0.32 sec)
meanwhile the imported database:
mysql> SELECT count(id) FROM payments;
+-----------+
| count(id) |
+-----------+
| 963770 |
+-----------+
1 row in set (0.20 sec)
mysql> SELECT count(id) FROM payments WHERE is_held = True;
+-----------+
| count(id) |
+-----------+
| 0 |
+-----------+
1 row in set (37.20 sec)
So when including a non-indexed boolean in the WHERE-clause, the performance of the imported database drops by over 100 times.
An explain for the queries shows the same result in both databases:
mysql> explain SELECT count(id) FROM payments WHERE is_held = True;
+----+-------------+----------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------+
| 1 | SIMPLE | payments | ALL | NULL | NULL | NULL | NULL | 739335 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------+
1 row in set (0.00 sec)
I've tried an OPTIMIZE payments;
run on the imported database, it didn't help, though the output was:
mysql> OPTIMIZE TABLE payments;
+----------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+----------+----------+----------+-------------------------------------------------------------------+
| payments | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| payments | optimize | status | OK |
+----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (2 min 13.01 sec)
The database is hosted in Amazon RDS, so my access to slow query logs and looking at hardware etc is a bit limited.
Being on RDS, I've tried upsizing the instance type a couple times, which I believe will move the database to new hardware, and in each case there's been no significant improvement.
I've run this query many times over the last few days, but now when I run it twice in a row it finishes instantly, and I can start a new MySQL connection and it is instant from the first run. I can even scroll up my terminal window and observe multiple earlier runs where it was slow, though in each case delineated by an RDS instance resize.
The reason I investigated this in the first place is a scheduled task hits this every 30 seconds and the scheduled tasks were starting to overlap and consume all my workers from other tasks so it was definitely recurring regularly before, not just first run.
I've re-enabled the scheduled task again, and it seems that the first run was slow, 33 seconds, but the second run was 2 seconds.
Best Answer
Here are the principles.
id
is the primary key of the table and the table is InnoDB.)SELECT count(id) FROM payments;
will read the entire table. Or it may read some secondary index, which necessarily includes the PK.WHERE is_held = True
needs was not involved in that index, then this query will need to read the entire data BTree. One of your tables apparently had it cached already (0.32s) and the other did not (37s).COUNT(id)
unless you need to avoid counting any rows withid IS NULL
, which cannot happen. Instead, use simplyCOUNT(*)
.INDEX(hash_id, id)
would make some of your queries run faster because they could run entirely in that index's BTree.For further discussion, please provide
SHOW CREATE TABL
,SHOW TABLE STATUS
, and the value ofinnodb_buffer_pool_size
.