Mariadb – How to debug trivial MariaDB query with simple execution plan which often runs really slowly

awscentos-7mariadb

The query (with literal ? for dynamic query goodness):

SELECT UNIX_TIMESTAMP(my_timestamp_field) AS my_name
  FROM my_table
 WHERE my_column=?

This query is one of the most common in my slow query log (limit 1 second) – it has taken up to 10 seconds several times just in the last few hours. The lock time on every one of the slow queries except one was 50ms±2ms. I've done the research below, but I'm stumped for how to continue.

Is it possible that another lock somehow is taking up the time without being counted in the lock time? Are there any AWS EBS volume limits I need to be aware of?

Query plan with representative value:

MariaDB [my_schema]> EXPLAIN SELECT UNIX_TIMESTAMP(my_timestamp_field) AS my_name FROM my_table WHERE my_column=10094;
+------+-------------+----------+------+-----------------+---------------+---------+-------+------+-------+
| id   | select_type | table    | type | possible_keys   | key           | key_len | ref   | rows | Extra |
+------+-------------+----------+------+-----------------+---------------+---------+-------+------+-------+
|    1 | SIMPLE      | my_table | ref  | my_column_idx   | my_column_idx | 8       | const |    1 |       |
+------+-------------+----------+------+-----------------+---------------+---------+-------+------+-------+

The table is quite simple, with 21 fixed width columns:

MariaDB [(my_schema)]> SHOW CREATE TABLE my_table\G
*************************** 1. row ***************************
       Table: my_table
Create Table: CREATE TABLE `my_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `my_column` bigint(20) NOT NULL,
  `my_timestamp_field` datetime DEFAULT NULL,
  [other columns]
  PRIMARY KEY (`id`),
  UNIQUE KEY `ind_some_key` (`other_column_1`,`other_column_2`),
  KEY `my_column_idx` (`my_column`)
) ENGINE=InnoDB AUTO_INCREMENT=48033 DEFAULT CHARSET=utf8

The table is 45k rows, and all the columns are fixed length.

Running MariaDB 5.5 on CentOS 7 on a fast AWS EC2 machine.

Best Answer

Only answering the question "how to debug it further".

  • (Sorry about this one, but it must always be said) Make sure you're looking at the right record in the slow log. The part before a query relates to the query, the part after it does not:
# Time: 161128 19:20:53
# User@Host: root[root] @ localhost [127.0.0.1]
# Thread_id: 25  Schema: test  QC_hit: No
# Query_time: 1.925981  Lock_time: 0.924844  Rows_sent: 0  Rows_examined: 0
SET timestamp=1480353653;
select UNIX_TIMESTAMP(my_timestamp_field) AS my_name FROM my_table WHERE my_column=9;
# Time: 161128 19:21:11
# User@Host: root[root] @ localhost [127.0.0.1]
# Thread_id: 25  Schema: test  QC_hit: No
# Query_time: 10.000428  Lock_time: 0.000000  Rows_sent: 1  Rows_examined: 0
SET timestamp=1480353671;

The first 6 lines in the block above are the relevant part. Even if the query actually takes 1 second and not 10, it is worth investigating, but it would be a quite different kind of mystery.

  • Double-check the records in the slow log, specifically Rows_sent and Rows_examined. Since my_column is not unique, it's possible that with some parameter values, or with some unfortunate statistics, the query scans many rows, not just one as in your query plan example. This should rule out a bad execution plan.
# Time: 161128 19:42:19
# User@Host: elenst[elenst] @ localhost [127.0.0.1]
# Thread_id: 26  Schema: test  QC_hit: No
# Query_time: 1.622502  Lock_time: 0.000373  Rows_sent: 32768  Rows_examined: 32768
SET timestamp=1480354939;
select UNIX_TIMESTAMP(my_timestamp_field) AS my_name FROM my_table WHERE my_column=3;
  • Additional check in regard to a bad plan. Try to run a similar query, but not using index, either with a non-indexed column in WHERE, or with IGNORE INDEX hint. The goal is to get full scan and to see how long it takes. With 45K records, even full scan shouldn't take 10+ seconds; but if it does, a wrong plan is still on the table. If it's not anywhere near 10+ seconds, then more likely something else is blocking the query.

  • Start a monitor which runs SHOW PROCESSLIST frequently, e.g. every 5 seconds, so that you certainly catch a 10-second query. Better if the monitor runs in a persistent connection, but if it's too difficult, just run the MySQL client with -e 'SHOW PROCESSLIST' in a loop. Don't forget to show timestamps, too. When you catch the query, the State field will show in which stage the query spends the time, and possibly the process list will also give a hint with which other queries your might be conflicting.

  • If you have permissions, add SHOW ENGINE INNODB STATUS\G to the monitor. It might show some details if the query is spending time inside InnoDB.

  • Add SHOW GLOBAL STATUS LIKE 'Open%' to the monitor, maybe less frequently. It should show whether the tables get kicked out of the table cache too often, it can cause delays in a busy environment.