MySQL – Tools and Methods to Investigate Slow Query

explainMySQLoptimizationperformancequery-performancetools

I have a query with a subquery, both are fast but when combined they take forever (meaning whatever time above what is acceptable in my application). I tried using 'explain' to see what's going on, it didn't help me much. What other ways are there to investigate slow queries?

Below are the specific details of my problem. I have a list of products, I want to find the date of the first vendor event for each product. If I get the list of event IDs, the query goes fast (each product might have multiple events, they are created in chronological order so min event ID is the earliest; ProdID is indexed); then I use this list of event IDs to get the date, this second query goes fast too (event ID is the primary key, therefore indexed). But if I use the first query as a subquery of the second, it takes forever.

Query 1:

mysql> select min(VendorEventID) from Vendor_Events where ProdID in (598446938, 598446984, 598447024, 598447054, 598447311, 598447523, 598447764, 598447778, 
598448000, 598448048) and vendorid = 12 group by ProdID ;
+--------------------+
| min(VendorEventID) |
+--------------------+
|        11217790121 |
|        11217792453 |
|        11217793912 |
|        11217793894 |
|        11217794825 |
|        11217815018 |
|        11217813148 |
|        11217828936 |
|        11217830215 |
|        11217829202 |
+--------------------+
10 rows in set (0.00 sec)

Query 2:

mysql> select * from Vendor_Events where VendorEventID in ( 11217790121, 11217792453, 11217793912, 11217793894, 11217794825, 11217815018, 11217813148, 11217828936, 11217830215, 11217829202 ) ;
+---------------+----------+-----------+---------------------+------------+--------+
| VendorEventID | VendorID | ProdID    | LoggedDate          | VendorType | NameID |
+---------------+----------+-----------+---------------------+------------+--------+
|   11217790121 |       12 | 598446938 | 2016-12-07 16:33:58 |          2 |     32 |
|   11217792453 |       12 | 598446984 | 2016-12-07 16:34:14 |          2 |     32 |
|   11217793894 |       12 | 598447054 | 2016-12-07 16:34:29 |          2 |     32 |
|   11217793912 |       12 | 598447024 | 2016-12-07 16:34:29 |          2 |     32 |
|   11217794825 |       12 | 598447311 | 2016-12-07 16:34:33 |          2 |     32 |
|   11217813148 |       12 | 598447764 | 2016-12-07 16:37:10 |          2 |     32 |
|   11217815018 |       12 | 598447523 | 2016-12-07 16:37:24 |          2 |     32 |
|   11217828936 |       12 | 598447778 | 2016-12-07 16:39:54 |          2 |     32 |
|   11217829202 |       12 | 598448048 | 2016-12-07 16:39:55 |          2 |     32 |
|   11217830215 |       12 | 598448000 | 2016-12-07 16:40:04 |          2 |     32 |
+---------------+----------+-----------+---------------------+------------+--------+
10 rows in set (0.04 sec)

Explain them combined:

mysql> explain  select * from Vendor_Events where VendorEventID in ( select min(VendorEventID) from Vendor_Events where ProdID in (598446938, 598446984, 598447024, 598447054, 598447311, 598447523, 598447764, 598447778, 598448000, 598448048) and vendorid = 12 group by ProdID ) ;
+----+--------------------+---------------+-------+-----------------+----------+---------+------+----------+--------------------------+
| id | select_type        | table         | type  | possible_keys   | key      | key_len | ref  | rows     | Extra                    |
+----+--------------------+---------------+-------+-----------------+----------+---------+------+----------+--------------------------+
|  1 | PRIMARY            | Vendor_Events | ALL   | NULL            | NULL     | NULL    | NULL | 78536406 | Using where              |
|  2 | DEPENDENT SUBQUERY | Vendor_Events | range | ProdID,ProdID_2 | ProdID_2 | 16      | NULL |       15 | Using where; Using index |
+----+--------------------+---------------+-------+-----------------+----------+---------+------+----------+--------------------------+
2 rows in set (0.00 sec)

Of course if anyone can help me figure out what's causing the query to be super slow, I'd appreciate. But if people want to tell me what other ways are there to investigate slow queries, I'd like that too, since we all face slow queries every now and then. 😉

Best Answer

Why the query is slow:

explain's output shows the following:

The query will scan ALL rows from first table, 78536406, and for each of these rows, it is will scan an average of 15 rows from the second table. In total, (78536406*15) rows will be read. This is by itself is an indicator of slowness.

Taking into consideration the size of the tables, there might be some memory swap, which cause slowness as well (Reading from disk to RAM, repeatedly.)

How to improve this query:

One possible way would be to create a table out of the sub-query, and use INNER JOIN between the first table, and the table from the previous step.

CREATE TABLE tmp_ve as select min(VendorEventID) as VendorEventID from Vendor_Events where ProdID in (598446938, 598446984, 598447024, 598447054, 598447311, 598447523, 598447764, 598447778, 598448000, 598448048) and vendorid = 12 group by ProdID;
-- No need to add indexes to this table, as it would contain only 10 rows.
SELECT Vendor_Events.* FROM Vendor_Events INNER JOIN tmp_ve USING (VendorEventID);

I didn't try it, but if you explain this second query, it will start by the tmp_ve table, then for each value, it will use the index on Vendor_Events to fetch the corresponding row

HTH