Mysql – InnoDB SELECT from disk slow for large table but MyISAM is fast

MySQLmysql-5.7

I'm having problems determining why a simple MySQL select query on an indexed, fairly large InnoDB table is taking so long for the first run, reading from the HDD (3-4 seconds)

However, switching the table to MyISAM, the same query is very quick for the first run (200-400 ms)

I tried importing the data into SQL Server on a slower Windows machine and the query was also quick.

Here's a somewhat stripped down version of the table:

CREATE TABLE memberclaim (
  ClaimNumber bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  PayerCode bigint(20) NOT NULL,
  MemberNumber bigint(20) NOT NULL,
  PayerProviderNumber bigint(20) NOT NULL,
  PayerProviderIndex int(11) NOT NULL,
  ClaimDate date NOT NULL,
  PaidDate datetime DEFAULT NULL,
  ServiceFrom date NOT NULL,
  ServiceTo date NOT NULL,
  TotalPaid decimal(9,2) NOT NULL,
  ProviderClaimID char(20) NOT NULL,
  PRIMARY KEY (ClaimNumber),
  UNIQUE KEY PayerCode (PayerCode,ProviderClaimID),
  KEY idx_MemberNumber (MemberNumber, ServiceFrom, ServiceTo),
  KEY idx_bypayerprovider (PayerProviderNumber, PayerProviderIndex)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Here's an example of a query being run:

SELECT MC.ClaimNumber,
       MC.PayerProviderNumber,
       MC.PayerProviderIndex,
       MC.ClaimDate,
       MC.PaidDate,
       MC.ServiceFrom,
       MC.ServiceTo,
       MC.TotalPaid,
       MC.ProviderClaimID
FROM memberclaim MC
WHERE MC.MemberNumber = 1234 AND
      MC.ServiceTo >= '2016-01-01' AND
      MC.ServiceFrom <= '2017-01-01';

Some background:

We're running MySQL 5.7.17 on Linux. I'm running this with mostly default MySQL settings, with the innodb_buffer_pool set to 7GB (not that it matters much in this case).

The table has about 35 million rows.

The server has 16 GB of RAM, 4 2.4 GHz processors, and an HDD (an SSD would obviously be much quicker).

This table is only written to at night, not when the system is in use.

The query is perfectly fine once cached, but we don't have nearly enough RAM on the server to cache everything and the vast majority of runs of this query are with unique MemberNumbers.

I've tried dumping and importing the entire table and that made no difference.

Are there any real options to speed up this query besides getting an SSD or switching it to MyISAM?

Any settings I should try modified that can speed up the disk access?

I've already read:
Why are simple SELECTs on InnoDB 100x slower than on MyISAM? and nothing in that post helped unfortunately.

Any help is appreciated, thanks.

Edit

This query returns a few hundred rows generally, around 100-300 and takes a few seconds to finish.

By unique MemberNumber, I just meant that the MemberNumber field in the WHERE clause is the field that changes and the query has to read from the disk constantly because of that.

The nightly runs are just inserts (tens of thousands of records) and a much smaller number of updates.

key_buffer_size = 1GB

SHOW TABLE STATUS results:

Name        Engine  Version Row_format  Rows       Avg_row_length   Data_length Max_data_length Index_length    Data_free   Auto_increment  Create_time           Update_time   Check_time  Collation   Checksum    Create_options  Comment
memberclaim InnoDB   10     Compact     30300000   160              4857004032  0               4823449600      5295308800  35137097        8/28/2014 12:38:44 PM                           latin1_swedish_ci

Here's the explain on the above SELECT statement, which returns 123 rows and took 2.265 seconds:

id    select_type    table    type    possible_keys    key              key_len       ref    rows     Extra
1     SIMPLE         MC       range   idx_MemberNumber idx_MemberNumber 11                   254      Using where

Here's the results of SHOW SESSION STATUS LIKE 'Handler%'

Variable_name               Value
Handler_commit              1
Handler_delete              0
Handler_discover            0
Handler_prepare             0
Handler_read_first          0
Handler_read_key            2
Handler_read_next           254
Handler_read_prev           0
Handler_read_rnd            0
Handler_read_rnd_next       0
Handler_rollback            0
Handler_savepoint           0
Handler_savepoint_rollback  0
Handler_update              0
Handler_write               0

Edit
After re-arranging the index, the above query took ~1 second (the same 123 records) and these are the Handler results:

Variable_name               Value
Handler_commit              1
Handler_delete              0
Handler_discover            0
Handler_prepare             0
Handler_read_first          0
Handler_read_key            2
Handler_read_next           123
Handler_read_prev           0
Handler_read_rnd            0
Handler_read_rnd_next       0
Handler_rollback            0
Handler_savepoint           0
Handler_savepoint_rollback  0
Handler_update              0
Handler_write               0

Best Answer

(First some comments; then a probable Answer.)

(4s is only 10x 400ms.)

How many rows have MemberNumber = 1234? How many rows are returned from the query?

What value of key_buffer_size were you using?

Please provide SHOW TABLE STATUS LIKE 'memberclaim' for each engine.

The table, plus indexes, take about 50% more space than necessary if you could shrink all the BIGINTs to INTs.

Please run (and provide output from) this for each engine:

EXPLAIN SELECT ...;

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';

That will give some more clues of how many reads it is really doing.

Timings

If 254 records (in the data) are touched in 2.265 seconds, that says about 9ms/record, which is roughly the speed of spinning disks, when none are cached. Please run the query twice in a row -- and see what the two timings are; probably the second one would be much faster, thereby confirming that the first run was non-cached.

Further 9ms/row implies that the rows are scattered around the table. This is likely to happen in either MyISAM or InnoDB when the index being used does not 'track' the order of the data insertion. Is the query in question the most important one? If so, the performance can be fixed in the following way. (Caution; the following will hurt performance on any range scan of ClaimNumber, but I suspect you don't do such.)

Answer

Change to InnoDB, so that the PRIMARY KEY is 'clustered' with the data and change from

PRIMARY KEY (ClaimNumber),
KEY idx_MemberNumber (MemberNumber, ServiceFrom, ServiceTo),

to

PRIMARY KEY (MemberNumber, ClaimNumber),
INDEX(ClaimNumber),

I think having all rows for MemberNumber = 1234 "clustered" together will give you enough performance boost to make this worthwhile. I left off the From and To; one of them could be added in the middle of the PK, but I don't think it will help a lot.

You will have about 100 rows per block. If, say, there are 530 rows for 1234, then that is 6 blocks to read -- about 60ms, which is a lot better than the 254 random blocks read (2265ms), as surmised above. Shrinking all the BIGINTs to INT UNSIGNED would get you down to about 40ms.

If you have other SELECTs that look at multiple rows for a single "member", they will also run faster.

Added bonus: Since fewer blocks are read for this query, other queries will find the cache (innodb_buffer_pool) more useful.

(This technique will not work for MyISAM because the data is not kept in any particular order. Also, UPDATEs tend to further mangle the data.) (Another note about MyISAM: It has been removed from the next major release, 8.0.)