Mysql – longtext in select makes query extremely slow, even if not used in where clause and empty result set (MySQL)

MySQL

As soon as I include a "longtext" type in the select clause, the query times go from 8s to 3min (Amazon RDS t2.small). The longtext isn't used in the where clause, and the result set is empty. See below:

mysql> select id from mbp_process where errorAcknowledged='N' and (exitCode != 0 or exitCode is null);
Empty set (8.03 sec)

mysql> select id, stdoutContents from mbp_process where errorAcknowledged='N' and (exitCode != 0 or exitCode is null);
Empty set (3 min 43.36 sec)

What boggles my mind is that asking for the longtext column by primary key is fast:

select stdoutContents from mbp_process where id = 49213;
...
1 row in set (0.00 sec)

Why is this? The effect is less pronounced on my bare-metal server: the query slows from 0.2s to 1:05m there.

This is the EXPLAIN on the "select id from…" query:

+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+
| id | select_type | table       | type | possible_keys                                       | key                        | key_len | ref   | rows  | Extra                              |
+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+
|  1 | SIMPLE      | mbp_process | ref  | idx_mbp_process_exitCode,idx_mbp_process_errorAcked | idx_mbp_process_errorAcked | 2       | const | 22551 | Using index condition; Using where |
+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+

This is the EXPLAIN from the "select id, stdoutContents from …" query:

+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+
| id | select_type | table       | type | possible_keys                                       | key                        | key_len | ref   | rows  | Extra                              |
+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+
|  1 | SIMPLE      | mbp_process | ref  | idx_mbp_process_exitCode,idx_mbp_process_errorAcked | idx_mbp_process_errorAcked | 2       | const | 22552 | Using index condition; Using where |
+----+-------------+-------------+------+-----------------------------------------------------+----------------------------+---------+-------+-------+------------------------------------+

They are identical.

This is the create table statement from "SHOW CREATE TABLE mbp_process":

CREATE TABLE `mbp_process` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `command` varchar(1000) DEFAULT NULL,
  `pid` varchar(45) DEFAULT NULL,
  `state` varchar(45) NOT NULL,
  `exitCode` int(11) DEFAULT NULL,
  `stdoutContents` longtext,
  `stdoutTruncated` char(1) DEFAULT NULL,
  `stdoutFilename` varchar(200) DEFAULT NULL,
  `stderrContents` longtext,
  `stderrTruncated` char(1) DEFAULT NULL,
  `stderrFilename` varchar(200) DEFAULT NULL,
  `majorProgress` varchar(45) DEFAULT NULL,
  `minorProgress` varchar(45) DEFAULT NULL,
  `startTime` datetime DEFAULT NULL,
  `endTime` datetime DEFAULT NULL,
  `errorAcknowledged` char(1) DEFAULT 'N',
  `errorComments` text,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `idx_mbp_process_command` (`command`(767)),
  KEY `idx_mbp_process_exitCode` (`exitCode`),
  KEY `idx_mbp_process_state` (`state`),
  KEY `idx_mbp_process_errorAcked` (`errorAcknowledged`)
) ENGINE=InnoDB AUTO_INCREMENT=50184 DEFAULT CHARSET=latin1

Choosing another column to include doesn't slow the query:

mysql> select id, created from mbp_process where errorAcknowledged='N' and (exitCode != 0 or exitCode is null);
Empty set (1.69 sec)

This is very weird: if I include "stderrContents", the query is fast. This is also a longtext column, although it generally has much less data in it. However, I'm not asking MySQL to examine the contents of the column, and the result set is empty, so why is it slow for "stdoutContents"?

mysql> select id, stderrContents from mbp_process where errorAcknowledged='N' and (exitCode != 0 or exitCode is null);
Empty set (0.57 sec)

Best Answer

PRIMARY KEY(id) says that it is clustered with the data. That is not the issue. Nor are the attempts at using a secondary index. Here's what is happening.

In InnoDB, usually all columns are next to the PRIMARY KEY in the BTree that is indexed by the PK. However "big" columns are put elsewhere. At most about 8KB of the row is kept together as you described. The big columns are in their own 16KB block(s). These columns include any TEXT/BLOB, or even long VARCHAR/VARBINARY columns. (The details vary with innodb_file_format and a couple other things. For example, the first 767 bytes of a big column might left with the short columns.)

So, when selecting columns that exclude such big columns, the query will avoid fetching those extra blocks and be relatively fast. It sounds like stdoutContents is really big (multiple 16KB blocks needed)?

Amazon vs bare metal: If I am not mistaken, Amazon stores the data on a SAN-like system, not the same box.

Another thing I see... The EXPLAIN says it is using this secondary key

KEY `idx_mbp_process_errorAcked` (`errorAcknowledged`)

Each secondary key implicitly includes the PK (id). So, the processing is something like:

  • Drill into secondary key for the first entry with errorAcknowledged='N'
  • Scan forward in that BTree. This is the most efficient step. There are probably more than 100 'rows' per 16KB block.
  • For each of those, use the id to reach into the "data" BTree. (1 block per row, hopefully less, if they are cached nicely)
  • Check the rest of the WHERE clause: and (exitCode != 0 or exitCode is null)
  • If the row is still interesting, fetch the desired local columns (id, created, and maybe stderrContents), and
  • Reach over into the 'big' storage for the stdoutContents (if you are asking for it). This is likely not to be cached and possibly involves many disk hits.

I hope that explains it all. Let me know if you need further clarification.