So few weeks ago we had an alert on our production MySQL database due to a CPU usage spike. We identified the issue as a query that had been there for several months that suddenly was misbehaving. The execution plan of that query is:
+----+-------------+------------+------------+------+----------------------------------------------------------+----------------------------+---------+----------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+----------------------------------------------------------+----------------------------+---------+----------------------------+------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 2485 | 100.00 | NULL |
| 2 | DERIVED | c | NULL | ALL | PRIMARY | NULL | NULL | NULL | 7420 | 10.00 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | app | NULL | ref | applications_status_ix,applications_ibfk_3 | applications_ibfk_3 | 4 | prod_v3.c.id | 750 | 0.41 | Using where |
| 2 | DERIVED | inv | NULL | ref | app_id | app_id | 4 | ref.app.app_id | 1 | 100.00 | NULL |
+----+-------------+------------+------------+------+----------------------------------------------------------+----------------------------+---------+----------------------------+------+----------+----------------------------------------------+
What we did was to add an index hint to use applications_status_ix
and then the query performance went back to normal and the CPU usage went down:
+----+-------------+------------+------------+--------+-------------------------------+-------------------------------+---------+----------------------------+--------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+-------------------------------+-------------------------------+---------+----------------------------+--------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 5077 | 100.00 | NULL |
| 2 | DERIVED | app | NULL | range | applications_status_ix | applications_status_ix | 1 | NULL | 464405 | 10.00 | Using index condition; Using where; Using temporary; Using filesort |
| 2 | DERIVED | c | NULL | eq_ref | PRIMARY | PRIMARY | 4 | app.company_id | 1 | 10.00 | Using where |
| 2 | DERIVED | inv | NULL | ref | app_id | app_id | 4 | app_id | 1 | 100.00 | NULL |
+----+-------------+------------+------------+--------+-------------------------------+-------------------------------+---------+----------------------------+--------+----------+---------------------------------------------------------------------+
Basically on the query we want certain application status that have a minimum appearance on the table. That's why is better using that index even with this cardinality:
+---------------------+------------+---------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------------+------------+---------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| applications | 1 | applications_status_ix | 1 | status | A | 9 | NULL | NULL | | BTREE | | |
| applications | 1 | applications_ibfk_3 | 1 | company_id | A | 16240 | NULL | NULL | | BTREE | | |
+---------------------+------------+---------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
So I have two questions:
- Why would the query planner suddenly decided use a different execution plan on a query that has been in use for a while? I do not have hard proof, but I really doubt the cardinality changed materially from what we have now.
- Trying to find a solution we found a about Histograms (values distribution is called on other DBMS). But unfortunately they are only available on MySQL 8 and we are still using version 5.7. Do you happen to know any other workaround on our MySQL versions?
EDIT1:
So the query being executed is
SELECT sum((CASE
WHEN `alias_50656806`.`status` = 'CAPTURED' THEN `alias_50656806`.`amount`
ELSE `alias_50656806`.`due_by` END - `alias_50656806`.`recovered_amount`))
FROM (SELECT `app`.`app_id`,
`app`.`status`,
`app`.`amount`,
`app`.`purchases`,
`app`.`due_by`,
sum(CASE
WHEN (`inv`.`paid_at` IS NULL AND DATEDIFF(current_date(), `inv`.`created_at`) < 10) THEN 0
ELSE IFNULL(`inv`.`amount_due`, 0) END) AS `recovered_amount`
FROM `applications` AS `app`
JOIN `companies` AS `c` ON `app`.`company_id` = `c`.`id`
LEFT OUTER JOIN `application_invoices` AS `inv` USING (`app_id`)
WHERE (NOT (`app`.`repurchases`) AND
`app`.`status` IN ('CAPTURED', 'LOCKED', 'ERROR') AND `c`.`is_test` = FALSE AND 1 = 1)
GROUP BY `app`.`credit_app_id`, `app`.`status`, `app`.`credit_amount`, `app`.`cashless_repurchases`,
`app`.`due_by`) AS `alias_50656806`;
An the table sizes are:
- applications has 12701431 records.
- companies has 7500 records
- the status distributions is
STATUS COUNT (*)
RECOVER_FUNDS 46400
ERROR 18792
LOCKED 3
CAPTURED 151854
And applications table DDL is
CREATE TABLE `applications` (
`app_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
`updated_at` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
`status` enum('PREAPPROVED','INCOMPLETE','DENIED','PENDING_DATA','PENDING_CC','FAILED_DATA','ERROR','READY','CAPTURED','EXPIRED','SETTLED','LOCKED','OPTED_OUT','RECOVER_FUNDS','EXCLUDED','ABANDONED','BAD_OPT') COLLATE utf8_bin NOT NULL,
`account_code` enum('REF','NO_REF') COLLATE utf8_bin NOT NULL,
`amount` int(10) unsigned NOT NULL,
`customer_id` int(11) unsigned NOT NULL,
`company_id` int(11) unsigned NOT NULL,
`flow_id` int(11) unsigned NOT NULL,
`app_ext_id` varchar(20) COLLATE utf8_bin NOT NULL,
`cashless_repurchases` tinyint(1) NOT NULL DEFAULT '0',
`random_no` float unsigned NOT NULL DEFAULT '0.5',
`breakage_amount` int(11) DEFAULT NULL,
`first_repurch` timestamp(3) NULL DEFAULT NULL,
`last_repurch` timestamp(3) NULL DEFAULT NULL,
`repurch_count` tinyint(3) unsigned NOT NULL DEFAULT '0',
`total_repurch_amount` int(10) unsigned NOT NULL DEFAULT '0',
`due_by` int(10) unsigned DEFAULT NULL,
`due_by_customer` int(10) unsigned DEFAULT NULL,
`last_modified_by` varchar(64) COLLATE utf8_bin DEFAULT NULL,
`app_user_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`app_id`),
UNIQUE KEY `flow_id` (`flow_id`),
UNIQUE KEY `app_ext_id` (`app_ext_id`),
KEY `applications_customer_ix` (`customer_id`),
KEY `applications_status_ix` (`status`),
KEY `applications_ibfk_3` (`company_id`),
CONSTRAINT `applications_ibfk_1` FOREIGN KEY (`flow_id`) REFERENCES `flow_entries` (`id`),
CONSTRAINT `applications_ibfk_2` FOREIGN KEY (`customer_id`) REFERENCES `customers` (`id`),
CONSTRAINT `applications_ibfk_3` FOREIGN KEY (`company_id`) REFERENCES `companies` (`id`),
CONSTRAINT `applications_ibfk_4` FOREIGN KEY (`status`) REFERENCES `application_statuses` (`status`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=12709869 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
Best Answer
The values in
filtered
column in both execution plans look pretty off target. Maybe it would help to disablecondition_fanout_filter
optimizer switch in the session executing these multi-join queries (see more about condition filtering).It probably takes the growing
applications
table size into account even if the number of records to select is mostly constant. Maybe an optimizer switch to make it prefer larger tables as leading would help to reduce the probability of similar failures in other queries (somewhat like Oracle'sDB_FILE_MULTIBLOCK_READ_COUNT
). But I don't see any such feature in MySQL.Query optimizer is somehow able to estimate the number of selected rows in
applications
: 464405 . But that's at least twice as more as the actual count from "table sizes" section. Maybe justANALYZE TABLE applications
may help to reduce the estimate and get the better execution plan without hints?