Mysql – Imported database very slow on a simple query

amazon-rdsMySQLmysql-5.6query-performance

A production database has been exported to a separate location for various reasons, however the dumped/imported database is performing much slower on some very specific queries. Here is 2 queries on production:

mysql> SELECT count(id) FROM payments;
+-----------+
| count(id) |
+-----------+
|    964390 |
+-----------+
1 row in set (0.14 sec)

mysql> SELECT count(id) FROM payments WHERE is_held = True;
+-----------+
| count(id) |
+-----------+
|         0 |
+-----------+
1 row in set (0.32 sec)

meanwhile the imported database:

mysql> SELECT count(id) FROM payments;
+-----------+
| count(id) |
+-----------+
|    963770 |
+-----------+
1 row in set (0.20 sec)

mysql> SELECT count(id) FROM payments WHERE is_held = True;
+-----------+
| count(id) |
+-----------+
|         0 |
+-----------+
1 row in set (37.20 sec)

So when including a non-indexed boolean in the WHERE-clause, the performance of the imported database drops by over 100 times.

An explain for the queries shows the same result in both databases:

mysql> explain SELECT count(id) FROM payments WHERE is_held = True;
+----+-------------+----------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------+
|  1 | SIMPLE      | payments | ALL  | NULL          | NULL | NULL    | NULL | 739335 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+--------+---------------+
1 row in set (0.00 sec)

I've tried an OPTIMIZE payments; run on the imported database, it didn't help, though the output was:

mysql> OPTIMIZE TABLE payments;
+----------+----------+----------+-------------------------------------------------------------------+
| Table    | Op       | Msg_type | Msg_text                                                          |
+----------+----------+----------+-------------------------------------------------------------------+
| payments | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| payments | optimize | status   | OK                                                                |
+----------+----------+----------+-------------------------------------------------------------------+
2 rows in set (2 min 13.01 sec)

The database is hosted in Amazon RDS, so my access to slow query logs and looking at hardware etc is a bit limited.

Being on RDS, I've tried upsizing the instance type a couple times, which I believe will move the database to new hardware, and in each case there's been no significant improvement.

I've run this query many times over the last few days, but now when I run it twice in a row it finishes instantly, and I can start a new MySQL connection and it is instant from the first run. I can even scroll up my terminal window and observe multiple earlier runs where it was slow, though in each case delineated by an RDS instance resize.

The reason I investigated this in the first place is a scheduled task hits this every 30 seconds and the scheduled tasks were starting to overlap and consume all my workers from other tasks so it was definitely recurring regularly before, not just first run.

I've re-enabled the scheduled task again, and it seems that the first run was slow, 33 seconds, but the second run was 2 seconds.

Best Answer

Here are the principles.

  • (I assume id is the primary key of the table and the table is InnoDB.)
  • SELECT count(id) FROM payments; will read the entire table. Or it may read some secondary index, which necessarily includes the PK.
  • The first time you touch a table, the data is sitting on disk must be read into RAM for processing. Since it needs to read the entire table, that could take some time.
  • If you run another query on the same table, it can skip the I/O. This is often a factor of 10 faster.
  • If the first query used a secondary index, but the WHERE is_held = True needs was not involved in that index, then this query will need to read the entire data BTree. One of your tables apparently had it cached already (0.32s) and the other did not (37s).
  • Don't use COUNT(id) unless you need to avoid counting any rows with id IS NULL, which cannot happen. Instead, use simply COUNT(*).
  • INDEX(hash_id, id) would make some of your queries run faster because they could run entirely in that index's BTree.
  • Upgrading/downgrading the server is unlikely to change performance -- unless the table is bigger than the buffer_pool. In this case, the table cannot be cached to make the second query fast.

For further discussion, please provide SHOW CREATE TABL, SHOW TABLE STATUS, and the value of innodb_buffer_pool_size.