As soon as I include a "longtext" type in the select clause, the query times go from 8s to 3min (Amazon RDS t2.small). The longtext isn't used in the where clause, and the result set is empty. See below:
mysql> select id from mbp_process where errorAcknowledged='N' and (exitCode != 0 or exitCode is null);
Empty set (8.03 sec)
mysql> select id, stdoutContents from mbp_process where errorAcknowledged='N' and (exitCode != 0 or exitCode is null);
Empty set (3 min 43.36 sec)
What boggles my mind is that asking for the longtext column by primary key is fast:
select stdoutContents from mbp_process where id = 49213;
...
1 row in set (0.00 sec)
Why is this? The effect is less pronounced on my bare-metal server: the query slows from 0.2s to 1:05m there.
This is the EXPLAIN on the "select id from…" query:
+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+
| 1 | SIMPLE | mbp_process | ref | idx_mbp_process_exitCode,idx_mbp_process_errorAcked | idx_mbp_process_errorAcked | 2 | const | 22551 | Using index condition; Using where |
+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+
This is the EXPLAIN from the "select id, stdoutContents from …" query:
+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+
| 1 | SIMPLE | mbp_process | ref | idx_mbp_process_exitCode,idx_mbp_process_errorAcked | idx_mbp_process_errorAcked | 2 | const | 22552 | Using index condition; Using where |
+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+
They are identical.
This is the create table statement from "SHOW CREATE TABLE mbp_process":
CREATE TABLE `mbp_process` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`command` varchar(1000) DEFAULT NULL,
`pid` varchar(45) DEFAULT NULL,
`state` varchar(45) NOT NULL,
`exitCode` int(11) DEFAULT NULL,
`stdoutContents` longtext,
`stdoutTruncated` char(1) DEFAULT NULL,
`stdoutFilename` varchar(200) DEFAULT NULL,
`stderrContents` longtext,
`stderrTruncated` char(1) DEFAULT NULL,
`stderrFilename` varchar(200) DEFAULT NULL,
`majorProgress` varchar(45) DEFAULT NULL,
`minorProgress` varchar(45) DEFAULT NULL,
`startTime` datetime DEFAULT NULL,
`endTime` datetime DEFAULT NULL,
`errorAcknowledged` char(1) DEFAULT 'N',
`errorComments` text,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_mbp_process_command` (`command`(767)),
KEY `idx_mbp_process_exitCode` (`exitCode`),
KEY `idx_mbp_process_state` (`state`),
KEY `idx_mbp_process_errorAcked` (`errorAcknowledged`)
) ENGINE=InnoDB AUTO_INCREMENT=50184 DEFAULT CHARSET=latin1
Choosing another column to include doesn't slow the query:
mysql> select id, created from mbp_process where errorAcknowledged='N' and (exitCode != 0 or exitCode is null);
Empty set (1.69 sec)
This is very weird: if I include "stderrContents", the query is fast. This is also a longtext column, although it generally has much less data in it. However, I'm not asking MySQL to examine the contents of the column, and the result set is empty, so why is it slow for "stdoutContents"?
mysql> select id, stderrContents from mbp_process where errorAcknowledged='N' and (exitCode != 0 or exitCode is null);
Empty set (0.57 sec)
Best Answer
PRIMARY KEY(id)
says that it is clustered with the data. That is not the issue. Nor are the attempts at using a secondary index. Here's what is happening.In InnoDB, usually all columns are next to the PRIMARY KEY in the BTree that is indexed by the PK. However "big" columns are put elsewhere. At most about 8KB of the row is kept together as you described. The big columns are in their own 16KB block(s). These columns include any TEXT/BLOB, or even long VARCHAR/VARBINARY columns. (The details vary with innodb_file_format and a couple other things. For example, the first 767 bytes of a big column might left with the short columns.)
So, when selecting columns that exclude such big columns, the query will avoid fetching those extra blocks and be relatively fast. It sounds like stdoutContents is really big (multiple 16KB blocks needed)?
Amazon vs bare metal: If I am not mistaken, Amazon stores the data on a SAN-like system, not the same box.
Another thing I see... The
EXPLAIN
says it is using this secondary keyEach secondary key implicitly includes the PK (
id
). So, the processing is something like:errorAcknowledged='N'
id
to reach into the "data" BTree. (1 block per row, hopefully less, if they are cached nicely)and (exitCode != 0 or exitCode is null)
I hope that explains it all. Let me know if you need further clarification.