Sql-server – Why would adding an index on a MySQL table slow it down significantly but ok on SQL Server and PostgreSQL

MySQLperformancequery-performancesql server

Version MySQL 5.7.2

I'm working with dummy data with int columns.

time: 1-10mm

products: 70,000 random ints with another 30,000 that are dupes from the 70,000

volume: ranges from 500 – 1000

price: ranges from 10 – 50 though the price stays within bounds of 1-5 diff for each product row data.

from the above, 10mm rows are created by randomly selecting a product and generating the required row data

running a range query like…

select * from productdata where product >= 1500 and product <= 2000

takes about 4 seconds.

When I add an index on product using…

create index productindex on productdata(product)

The query now takes about 30 seconds. Time is the only unique column in the table but setting that as the primary key does not help either.

On SQL Server and PostgreSQL I don't see the same issues with the same data and query using a non clustered index in each. I only really have experience with writing queries for SQL Server so a bit perplexed by this. I tried PostgreSQL as well to have another db to compare to.

All databases are the latest stable versions available.

Action Output (I had to decrease the range as the original query was taking too long)…

Without Index..

With Index..

Table status…

Buffer…

Explain Select…

Show create table…

CREATE TABLE `products` (
  `time` int(11) DEFAULT NULL,
  `product` int(11) DEFAULT NULL,
  `quantity` int(11) DEFAULT NULL,
  `price` int(11) DEFAULT NULL,
  KEY `productindex` (`product`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Trying with time and product as key…

Best Answer

MyISAM? InnoDB? What is the cache size? (SHOW VARIABLES LIKE '%buffer%';) Did you run the query a second time (to cancel out the affect of caching)? Do you have TEXT or BLOB columns? (Please provide SHOW CREATE TABLE.) How much RAM? How big (GB) is the table (SHOW TABLE STATUS).

Probably... Definitely... The cache for the data was not large enough to hold the products in question. And, since the products are mostly "random", there is a lot of I/O.

Here's what happens in any(?) database vendor for that query:

Locate the 'row' for 1500 in the index.
Scan forward through the index until 2000. (Due to BTree organization in MySQL, this is quite efficient.)
For each row, reach over into the "data" to get all the columns (SELECT *). (This is where vendors vary. And "Engines" in MySQL will vary.) Since the data rows are in one order, but the index rows are in a different order, MySQL will be effectively doing 'random I/O' to get the rows. (MySQL fetches the rows as needed. Other vendors may sort the row-addresses first -- perhaps a benefit, perhaps a cost.)
For MySQL with InnoDB and large TEXT/BLOB columns, those columns may be stored elsewhere, thereby necessitating an extra I/O. (Hence the admonition not to use *, but to spell out only the necessary columns.)

But...

If "too much" of the table needs to be fetched -- in particular more than about 20% of the table has product in the desired range -- MySQL will do a table scan instead of using the index. (I don't know about non-MySQL vendors.) This optimization is usually beneficial, but sometimes is a mistake. The EXPLAIN SELECT .. would tell us which it did.

InnoDB "reaches into the data" via the PRIMARY KEY which is "clustered" with the data. So it is a BTree-probe to find each row. MyISAM has a byte address into the data file, so it is a fseek. Other vendors work differently.

So... Is the comparison "fair"? Do you have the PK "clustered" with the data on all vendor tests? (Etc)

And I did not get into the caching details. This is probably a major component of the slowdown.

I'll fill in more details after you provide more details.

After details

You are using InnoDB, not MyISAM (good). innodb_buffer_pool_size = 8M is much too small. For 16GB of RAM, recommend 11G. But even 1G would show a significant speedup since the table is smaller than that.

You say you are running MySQL 5.7, yet the 8M default contradicts that. Did you override the default? Please provide the version (SELECT @@version;).

It seems that there are no big TEXT or BLOB columns, so my comments on such do not apply.

Bottom line: To get a 'fair' comparison, increase the buffer_pool setting for MySQL. This is the most important tunable for performance. It is not automatically set because it depends on the amount of available RAM.

InnoDB Architecture

enter image description here

CONJECTURE #1

Since InnoDB allocates its buffer pool contiguously, one could only imagine if there is any form of fragmentation of data and index pages among other things in the Buffer Pool. What other things? There are also updates for secondary non-unique indexes posted first inside the Buffer Pool before they are merged into the System Tablespace file (better known as ibdata1). Please notice the Insert Buffer within the InnoDB Architecture.

CONJECTURE #2

While a larger buffer pool can reduce disk I/O, a buffer pool that's too large may have issues, such as the fragmentation issue I just mentioned.

Perhaps sparse memory, though contiguous, is not a good thing either. I discussed something of that nature back on Oct 22, 2012 : How large should be mysql innodb_buffer_pool_size?

I noticed the miscellaneous pages (Innodb_buffer_pool_pages_misc) within the buffer pool. That takes up 235 MB (16384 * 15022). Just some overhead for indexing and row locks. Notwithstanding, this can be victimized by a large buffer pool with lots of sizable empty gaps.

CONJECTURE #3

Data and index pages are not the only things that occupy the Buffer Pool. An annoying neighbor they must share space with is the Insert Buffer. That section of the Buffer Pool contains the changes for all non-unique indexes. Depending on the number of indexes each InnoDB table has, up to half of the Buffer Pool could be used. This should be a good reminder to look over all InnoDB table and remove redundant indexes

Here are other posts that describes how to hunt down redundant indexes and why to get rid of them:

Jan 26, 2012 : MySql - find redundant indexes
Dec 10, 2011 : Do I have duplicate key indexes?
Sep 07, 2011 : MySQL CPU Usage
May 17, 2011 : Do I need to add a new single column index to a table if a multi-column index on that field already exists?

Think about it: If a table has redundant indexes, INSERTs and UPDATEs to a table will results in rebalancing tree pages 45% of of the time (in the worst case scenario). Such tree rebalancing with happen multiple times for a table with a lot of indexes. Removing redundant indexes cuts down the time spent writing those changes to the Insert Buffer and in turn, to the Index Pages of each InnoDB table.

Increasing the Buffer Pool Size leaves room for the Insert Buffer to run amok if you have InnoDB tables with too many indexes, especially redundant indexes. Another telltale sign that you need to look for such tables is to run this query:

SELECT table_schema,table_name,data_length,index_length
FROM information_schema.tables WHERE engine='InnoDB'
AND data_length/(data_length+index_length) < 0.8;

This will give you a general idea which tables need to have the indexes looked over for redundancy of leading columns.

EPILOGUE

Keep in mind that the Buffer Pool holds

Data Pages
Index Pages
Insert Buffer for Index Changes
Adaptive Hash Index
Opened tables info
Miscellaneous Overhead
Refer to the InnoDB Architecture

Wasted space in a Buffer Pool could take more time to sift through, especially with a single core that is solely responsible for all of RAM.

My advice would be to keep the innodb_buffer_pool_size at 10G.

Mysql – Identical query, tables, but different EXPLAIN and performance

Just make a simple join. Sub-queries does not provide the best result quite often

EXPLAIN SELECT l.id, l.level_name, l.date_published, l.rating
FROM levels AS l
INNER JOIN users_favorites AS uf 
ON uf.level_id = l.id
WHERE l.user_id = 2;