Version MySQL 5.7.2
I'm working with dummy data with int columns.
time: 1-10mm
products: 70,000 random ints with another 30,000 that are dupes from the 70,000
volume: ranges from 500 – 1000
price: ranges from 10 – 50 though the price stays within bounds of 1-5 diff for each product row data.
from the above, 10mm rows are created by randomly selecting a product and generating the required row data
running a range query like…
select * from productdata where product >= 1500 and product <= 2000
takes about 4 seconds.
When I add an index on product using…
create index productindex on productdata(product)
The query now takes about 30 seconds. Time is the only unique column in the table but setting that as the primary key does not help either.
On SQL Server and PostgreSQL I don't see the same issues with the same data and query using a non clustered index in each. I only really have experience with writing queries for SQL Server so a bit perplexed by this. I tried PostgreSQL as well to have another db to compare to.
All databases are the latest stable versions available.
Action Output (I had to decrease the range as the original query was taking too long)…
Without Index..
With Index..
Table status…
Buffer…
Explain Select…
Show create table…
CREATE TABLE `products` (
`time` int(11) DEFAULT NULL,
`product` int(11) DEFAULT NULL,
`quantity` int(11) DEFAULT NULL,
`price` int(11) DEFAULT NULL,
KEY `productindex` (`product`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Trying with time and product as key…
Best Answer
MyISAM? InnoDB? What is the cache size? (
SHOW VARIABLES LIKE '%buffer%';
) Did you run the query a second time (to cancel out the affect of caching)? Do you haveTEXT
orBLOB
columns? (Please provideSHOW CREATE TABLE
.) How much RAM? How big (GB) is the table (SHOW TABLE STATUS
).Probably...Definitely... The cache for the data was not large enough to hold the products in question. And, since the products are mostly "random", there is a lot of I/O.Here's what happens in any(?) database vendor for that query:
SELECT *
). (This is where vendors vary. And "Engines" in MySQL will vary.) Since the data rows are in one order, but the index rows are in a different order, MySQL will be effectively doing 'random I/O' to get the rows. (MySQL fetches the rows as needed. Other vendors may sort the row-addresses first -- perhaps a benefit, perhaps a cost.)*
, but to spell out only the necessary columns.)But...
If "too much" of the table needs to be fetched -- in particular more than about 20% of the table has
product
in the desired range -- MySQL will do a table scan instead of using the index. (I don't know about non-MySQL vendors.) This optimization is usually beneficial, but sometimes is a mistake. TheEXPLAIN SELECT ..
would tell us which it did.InnoDB "reaches into the data" via the
PRIMARY KEY
which is "clustered" with the data. So it is a BTree-probe to find each row. MyISAM has a byte address into the data file, so it is afseek
. Other vendors work differently.So... Is the comparison "fair"? Do you have the PK "clustered" with the data on all vendor tests? (Etc)
And I did not get into the caching details. This is probably a major component of the slowdown.
I'll fill in more details after you provide more details.
After details
You are using InnoDB, not MyISAM (good).
innodb_buffer_pool_size = 8M
is much too small. For 16GB of RAM, recommend11G
. But even 1G would show a significant speedup since the table is smaller than that.You say you are running MySQL 5.7, yet the 8M default contradicts that. Did you override the default? Please provide the version (
SELECT @@version;
).It seems that there are no big
TEXT
orBLOB
columns, so my comments on such do not apply.Bottom line: To get a 'fair' comparison, increase the buffer_pool setting for MySQL. This is the most important tunable for performance. It is not automatically set because it depends on the amount of available RAM.