I'm having problems determining why a simple MySQL select query on an indexed, fairly large InnoDB table is taking so long for the first run, reading from the HDD (3-4 seconds)
However, switching the table to MyISAM, the same query is very quick for the first run (200-400 ms)
I tried importing the data into SQL Server on a slower Windows machine and the query was also quick.
Here's a somewhat stripped down version of the table:
CREATE TABLE memberclaim (
ClaimNumber bigint(20) unsigned NOT NULL AUTO_INCREMENT,
PayerCode bigint(20) NOT NULL,
MemberNumber bigint(20) NOT NULL,
PayerProviderNumber bigint(20) NOT NULL,
PayerProviderIndex int(11) NOT NULL,
ClaimDate date NOT NULL,
PaidDate datetime DEFAULT NULL,
ServiceFrom date NOT NULL,
ServiceTo date NOT NULL,
TotalPaid decimal(9,2) NOT NULL,
ProviderClaimID char(20) NOT NULL,
PRIMARY KEY (ClaimNumber),
UNIQUE KEY PayerCode (PayerCode,ProviderClaimID),
KEY idx_MemberNumber (MemberNumber, ServiceFrom, ServiceTo),
KEY idx_bypayerprovider (PayerProviderNumber, PayerProviderIndex)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Here's an example of a query being run:
SELECT MC.ClaimNumber,
MC.PayerProviderNumber,
MC.PayerProviderIndex,
MC.ClaimDate,
MC.PaidDate,
MC.ServiceFrom,
MC.ServiceTo,
MC.TotalPaid,
MC.ProviderClaimID
FROM memberclaim MC
WHERE MC.MemberNumber = 1234 AND
MC.ServiceTo >= '2016-01-01' AND
MC.ServiceFrom <= '2017-01-01';
Some background:
We're running MySQL 5.7.17 on Linux. I'm running this with mostly default MySQL settings, with the innodb_buffer_pool set to 7GB (not that it matters much in this case).
The table has about 35 million rows.
The server has 16 GB of RAM, 4 2.4 GHz processors, and an HDD (an SSD would obviously be much quicker).
This table is only written to at night, not when the system is in use.
The query is perfectly fine once cached, but we don't have nearly enough RAM on the server to cache everything and the vast majority of runs of this query are with unique MemberNumbers.
I've tried dumping and importing the entire table and that made no difference.
Are there any real options to speed up this query besides getting an SSD or switching it to MyISAM?
Any settings I should try modified that can speed up the disk access?
I've already read:
Why are simple SELECTs on InnoDB 100x slower than on MyISAM? and nothing in that post helped unfortunately.
Any help is appreciated, thanks.
Edit
This query returns a few hundred rows generally, around 100-300 and takes a few seconds to finish.
By unique MemberNumber, I just meant that the MemberNumber field in the WHERE clause is the field that changes and the query has to read from the disk constantly because of that.
The nightly runs are just inserts (tens of thousands of records) and a much smaller number of updates.
key_buffer_size = 1GB
SHOW TABLE STATUS results:
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment
memberclaim InnoDB 10 Compact 30300000 160 4857004032 0 4823449600 5295308800 35137097 8/28/2014 12:38:44 PM latin1_swedish_ci
Here's the explain on the above SELECT statement, which returns 123 rows and took 2.265 seconds:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE MC range idx_MemberNumber idx_MemberNumber 11 254 Using where
Here's the results of SHOW SESSION STATUS LIKE 'Handler%'
Variable_name Value
Handler_commit 1
Handler_delete 0
Handler_discover 0
Handler_prepare 0
Handler_read_first 0
Handler_read_key 2
Handler_read_next 254
Handler_read_prev 0
Handler_read_rnd 0
Handler_read_rnd_next 0
Handler_rollback 0
Handler_savepoint 0
Handler_savepoint_rollback 0
Handler_update 0
Handler_write 0
Edit
After re-arranging the index, the above query took ~1 second (the same 123 records) and these are the Handler results:
Variable_name Value
Handler_commit 1
Handler_delete 0
Handler_discover 0
Handler_prepare 0
Handler_read_first 0
Handler_read_key 2
Handler_read_next 123
Handler_read_prev 0
Handler_read_rnd 0
Handler_read_rnd_next 0
Handler_rollback 0
Handler_savepoint 0
Handler_savepoint_rollback 0
Handler_update 0
Handler_write 0
Best Answer
(First some comments; then a probable Answer.)
(4s is only 10x 400ms.)
How many rows have MemberNumber = 1234? How many rows are returned from the query?
What value of
key_buffer_size
were you using?Please provide
SHOW TABLE STATUS LIKE 'memberclaim'
for each engine.The table, plus indexes, take about 50% more space than necessary if you could shrink all the
BIGINTs
toINTs
.Please run (and provide output from) this for each engine:
That will give some more clues of how many reads it is really doing.
Timings
If 254 records (in the data) are touched in 2.265 seconds, that says about 9ms/record, which is roughly the speed of spinning disks, when none are cached. Please run the query twice in a row -- and see what the two timings are; probably the second one would be much faster, thereby confirming that the first run was non-cached.
Further 9ms/row implies that the rows are scattered around the table. This is likely to happen in either MyISAM or InnoDB when the index being used does not 'track' the order of the data insertion. Is the query in question the most important one? If so, the performance can be fixed in the following way. (Caution; the following will hurt performance on any range scan of
ClaimNumber
, but I suspect you don't do such.)Answer
Change to InnoDB, so that the
PRIMARY KEY
is 'clustered' with the data and change fromto
I think having all rows for
MemberNumber = 1234
"clustered" together will give you enough performance boost to make this worthwhile. I left off the From and To; one of them could be added in the middle of the PK, but I don't think it will help a lot.You will have about 100 rows per block. If, say, there are 530 rows for
1234
, then that is 6 blocks to read -- about 60ms, which is a lot better than the 254 random blocks read (2265ms), as surmised above. Shrinking all theBIGINTs
toINT UNSIGNED
would get you down to about 40ms.If you have other
SELECTs
that look at multiple rows for a single "member", they will also run faster.Added bonus: Since fewer blocks are read for this query, other queries will find the cache (innodb_buffer_pool) more useful.
(This technique will not work for MyISAM because the data is not kept in any particular order. Also,
UPDATEs
tend to further mangle the data.) (Another note about MyISAM: It has been removed from the next major release, 8.0.)