MySQL on RDS – Avoiding Index Merge

indexMySQLoptimization

I'm somewhat experienced with mysql optimizations and routines but something caught me recently.

I was using an on-premise mysql installation over ec2 instances and just migrated to RDS so I can sleep tight without worrying.

Problem is that my application has some legacy issues and was previously running on mysql 5.1 and it's now running on mysql 5.6.

Our database was built upon the premise that index_merge will be used by mysql, so our indexes are not composite; they all affect single columns; I know how bad this might be and how bad it is, but it's been working fine so far; I can't really change it right now because we have over 300 tables.

What got me into trouble is that one of my tables controls inventory stock. The table has the following structure:

id (PK)
companyId (btree index)
productId (btree index)
transactionType (input, output or stock balance; btree index)
transactionDate (btree index)
transactionPrice

and a few other columns that won't really matter.

When I run a select query to obtain the last price for a given product, like this:

SELECT 
    transactionPrice
FROM
    stock
WHERE
    productId = x
        AND transactionType = 'input'
        AND companyId = y
        AND transactionDate < '2017-07-07'
ORDER BY transactionDate DESC
LIMIT 1;

What I expected was that MySQL would merge (most likely) the productId and companyId indexes, that are very specific, and read about 4 lines; but what actually happened is that MySQL decided to iterate over 5 million lines by sorting the transactionDate and not merging it to any other index.

I decided to dig a bit further:

  • SHOW @@optimizer_switch shows me that index_merge flags are all ON
  • Ran the query hinting the optimizer to IGNORE the transactionDate index; id did decide to index_merge productId and companyId as expected!
  • Created a new compound index for this specific table that aggregated all fields and it is now being used, so my problem is partially solved, as this is one table but we have a lot others
  • My first thought was related to MySQL version, but my notebook runs MySQL 5.7 (as opposed to 5.6 in RDS and 5.1 in our previous ec2 instance) and my notebook uses index_merge as the first option
  • I ran ANALYZE TABLE and it still uses the same behavior

So I'm now kinda lost; what variables might affect this? How can I help the optimizer to encourage the usage of index merge opposed to a full table scan?

Edit1: Adding more information after Rick's answer:

  • Yes, table is very big – currently about 10 million entries
  • InnoDB buffer pool size is 3/4 of machine's total memory; in this case, 25gb out of 36gb (rds default).
  • Buffer pool usage:

BUFFER POOL AND MEMORY
Total memory allocated 25738477568; in additional pool allocated 0
Dictionary memory allocated 12453955
Buffer pool size 1534976
Free buffers 8194
Database pages 1383533
Old database pages 510554
Modified db pages 4364
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 32361311, not young 319332363
1.87 youngs/s, 0.62 non-youngs/s
Pages read 15508690, created 745849, written 11868579
0.12 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 1383533, unzip_LRU len: 0
I/O sum[24]:cur[0], unzip sum[0]:cur[0]

As you may see, I have Buffer pool hit rate 1000 / 1000 which should mean it's good right?

Some table stats:
– 10 million entries
– About 750k products
– 3 transaction types
– about 150 companies

Lastly, the create table is as follows:

CREATE TABLE `estoque` (
`id` int(11) NOT NULL DEFAULT '0',
`companyId` int(11) NOT NULL DEFAULT '0',
`productId` int(11) NOT NULL DEFAULT '0',
`transactionDate` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`transactionType` char(1) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `transactionType` (`transactionType`),
KEY `transactionDate` (`transactionDate`),
KEY `productId` (`productId`),
KEY `companyId` (`companyId`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8

There are other columns but I ommited them for clarity sake.

Best Answer

I have found over the years that MySQL very rarely uses Index Merge. I think that part of the reason is that it is so inefficient. In order to perform it, it must

  1. Scan a potentially large part of one index, collecting the results somewhere.
  2. Scan a potentially large part of another index, collecting the results somewhere.
  3. "Merge" the results into a smaller result set (in the case of "intersect", which seems to be your case).
  4. Then go look up the rows.

Even when Index Merge could be used, it is essentially guaranteed to be slower than a suitable composite index.

For the query in question, this index is optimal, even handling the ORDER BY (thereby avoiding a tmp and sort), which I don't think Index Merge can do:

INDEX(productId, transactionType, companyId,  -- in any order
      transactionDate)   -- last

On a related topic... I assume this is a very large table? How big is the buffer_pool? Is it I/O-bound, or fully cached?

I ask these because, if it is I/O-bound, the choice of the PRIMARY becomes important in performance. Would you care to show usSHOW CREATE TABLE`; I will explain further. Based on the info you provided, I would guess that the rows needed for your query are scattered across the table, and not 'clustered' in a few blocks.

More useful info: How many different products? transactionTypes - apparently 3? companies? Are they evenly distributed?