MySQL not using primary key on inner join

innodbjoin;MySQLprimary-key

I have two tables:

Table 1:

CREATE TABLE `lk_transaction_types` (
  `transactionTypeID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `isActive` tinyint(2) unsigned NOT NULL,
  `code` varchar(8) NOT NULL,
  `description` varchar(150) NOT NULL,
  `isInbound` tinyint(2) unsigned NOT NULL DEFAULT '1',
  `isOutbound` tinyint(2) unsigned NOT NULL DEFAULT '1',
  PRIMARY KEY (`transactionTypeID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Table 2:

CREATE TABLE `ediLoad` (
  `loadID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `processID` int(10) unsigned NOT NULL,
  `success` tinyint(2) unsigned NOT NULL DEFAULT '0',
  `transactionTypeID` tinyint(2) unsigned DEFAULT NULL,
  `escoID` int(10) unsigned DEFAULT NULL,
  `ldcID` int(10) unsigned DEFAULT NULL,
  `commodityType` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `filename` varchar(150) NOT NULL,
  `loadDate` datetime NOT NULL,
  `processed` tinyint(2) unsigned NOT NULL DEFAULT '0',
  `processedDate` datetime DEFAULT NULL,
  `dataApplied` tinyint(2) unsigned NOT NULL DEFAULT '0',
  `dataAppliedDate` datetime DEFAULT NULL,
  `errorID` tinyint(3) unsigned DEFAULT NULL,
  `error` tinyint(2) unsigned DEFAULT '0',
  `warning` tinyint(2) unsigned DEFAULT '0',
  PRIMARY KEY (`loadID`),
  KEY `idx_processID` (`processID`,`transactionTypeID`,`escoID`),
  KEY `idx_escoID` (`escoID`),
  KEY `idx_filename` (`success`,`filename`),
  KEY `idx_bulk` (`processed`,`loadDate`),
  KEY `idx_loadDate` (`loadDate`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

When trying to run a simple query it is not using the Primary Key on the lk_transaction_types table:

SELECT COUNT(0)
FROM edi.ediLoad l
INNER JOIN edi.lk_transaction_types lk 
  ON  lk.transactionTypeID = l.transactionTypeID
WHERE l.escoID = 2
AND lk.isActive = 1
AND lk.isInbound = 1;

The Query is very slow. So I run explain and get this:

+----+-------------+-------+------------+------+-----------------------------------+------------------------+---------+--------------------------+------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys                     | key                    | key_len | ref                      | rows | filtered | Extra                              |
+----+-------------+-------+------------+------+-----------------------------------+------------------------+---------+--------------------------+------+----------+------------------------------------+
|  1 | SIMPLE      | lk    | NULL       | ALL  | PRIMARY                           | NULL                   | NULL    | NULL                     |   31 |     3.23 | Using where                        |
|  1 | SIMPLE      | l     | NULL       | ref  | idx_escoID,idx_ transactionTypeID | idx_ transactionTypeID | 2       | edi.lk.transactionTypeID | 7158 |    50.00 | Using index condition; Using where |
+----+-------------+-------+------------+------+-----------------------------------+------------------------+---------+--------------------------+------+----------+------------------------------------+

This seems like a simple query with one join on a primary key. Why is it not using the primary key? I've even tried adding 'FORCE INDEX FOR JOIN (PRIMARY)' to the join and it still doesn't use the primary key. Any help would be great. Thanks!

Best Answer

Notes

  • A FOREIGN KEY adds a constraint; it does not add an index if there is already one there.
  • Datatypes should match, but what you had was "close enough" for JOINing.
  • What was wrong -- The columns of an INDEX are looked at left-to-right. Since there was no mention of processID (the first column), the index was not useful.
  • When JOINing, MySQL usually starts with one table, then repeatedly reaches into the other. It picked lk. Why?...
  • When practical, MySQL will start with the table that it can do some filtering on. But neither table had any useful indexes. So?...
  • As a fallback, MySQL starts with the 'smaller' table, lk_transaction_types and did a full table scan ("ALL").

To make the query more efficient, usually the filtering (WHERE) is tackled first:

WHERE l.escoID = 2
  AND lk.isActive = 1
  AND lk.isInbound = 1;

To help the Optimizer picks l as the first table, have

INDEX(escoID, ...)

Or, to help it pick lk, have

INDEX(isActive, isInbound, ...)  -- in either order

The ... is optionally more columns.

Without understanding the distribution of the data values, let me proceed with deriving the optimal indexes for each case.

SELECT  COUNT(0)
    FROM  edi.ediLoad l
    INNER JOIN  edi.lk_transaction_types lk
        ON lk.transactionTypeID =
            l.transactionTypeID
    WHERE   l.escoID = 2
      AND  lk.isActive = 1
      AND  lk.isInbound = 1;

Case: Start with l:

l:   INDEX(escoID,             -- first, so as to satisfy `WHERE`
           transactionTypeID)  -- added to make the index "covering"

A "Covering index" contains all the columns (of one table) needed to satisfy the entire query. This lets the processing work only in the index BTree, and ignore the Data BTree.

lk:  INDEX(isActive, isInbound, transactionTypeID)  -- in any order

Again, this is 'covering'.

Case: Start with lk:

lk:  INDEX(isActive, isInbound,  -- first, in either order
           transactionTypeID)
l:   INDEX(escoID, transactionTypeID) -- in either order

In neither case was either PRIMARY KEY useful. So, I claim your original question was ill-advised. Instead, you should have asked "how can I make this query more efficient". That is the question I have been answering.

One more thing. Let's combine the two cases to minimize the number of indexes.

lk:  INDEX(isActive, isInbound,  -- first, in either order
           transactionTypeID)
l:   INDEX(escoID, transactionTypeID)  -- in this order

With InnoDB, the PK is "clustered" with the data. PRIMARY KEY(transactionTypeID) is sufficient for the second case, but not the first.

More on creating indexes.