Thesql innodb index: better one big or many compact

index-tuninginnodbmysql-8.0

I've a question for an expert in how innodb indexes are structured (mysql 8.0.18)

Say I have 4 varchar columns on a table with one billion rows:

country, state, city, attraction

I have queries that look for all or certain attractions, either by country, state, city or attraction name.

query1: "select * from table where attraction like 'asd%' and country = 'X'"  
query2: "select * from table where attraction like 'asd%' and country = 'X' and state = 'Y' and city = 'Z'"    
query3: "select distinct attraction from table where country='X'"     
query4: "select distinct attraction from table where attraction like 'Ux%'



Combined index: (attraction, country, state, city)

This index would cover all 4 queries.

Can I expect similar performance on query1,3,4 in comparison to a specialized index ?

Specialized index1:  (attraction, country)  
Specialized index2:  (attraction)

I don't have the time to dive into the details of innodb storage, I hope someone did that already 😉

My main thoughts on this:

More indexes will need more memory and storage (given a billion rows quite a bit), so that's a concern.
If an index that's made for 4 columns is called on a query that only needs one (the first) column or on two (the first two) columns is the data access sequential and as effective as when having small dedicated indexes (that basically contain duplicate data)?

So should I have one index, covering the WHERE requirements of all 4 queries or 3 indexes, each dedicated to the query it serves ?

Best Answer

danblack's post answers the main question about the best index strategy for you queries.

However, I would add a sometimes forgotten optimization of the index strategy, which is implemented in most recent versions of RDBMs (MySQL, MariaDB, PostgreSQL...): covering indexes

Definition of covering index: (from MySQL documentation)

An index that includes all the columns retrieved by a query. Instead of using the index values as pointers to find the full table rows, the query returns values from the index structure, saving disk I/O

That means that your third query:

select distinct attraction from table where country='X'

would benefit more from a (country, attraction) index than a simple (country) index.

Related Solutions

Mysql – why/how does the number of matched columns influences the way of excecuting a query

I can provide with a general explanation, but it may not apply specifically to your particular case:

The way decision making works is by evaluation cost of execution plan, then picking up what is hopefully the cheapest plan. This you already know.

When it comes to indexing, though, stuff are getting interesting. The way to evaluate the usefulness or viability of an index is to estimate the selectivity given some value.

For the moment, forget about your FULLTEXT index, and let's assume a simple index on some column col1, and another index on some column col2. Given the following two queries:

SELECT * FROM t WHERE col1 < 10 and col2 = 4;
SELECT * FROM t WHERE col1 BETWEEN 100 AND 110 and col2 = 4;

It may happen that the query is evaluated differently in these two cases. Why? Because it may happen that col2 = 4 returns more rows than col1 < 10, in which case we prefer to use index on col1. But then, it may return less rows than col1 BETWEEN 100 AND 110, in which case we prefer the index on col2.

Your case is not very much different. MySQL estimates the number of rows returned by some index query. When you use more columns, MySQL gets the impression your index is likely to result with few rows. So it chooses to start with TableA, then joins what should be very few rows with TableB.

But if MySQL believes the index to return many rows, it may prefer starting with TableB. Why is that? Because you are sorting on indexed columns of TableB. Sorting is a lot of work, too. So MySQL may choose to first sort the rows, then join to TableA and filter by fulltext index. It may not be a bad idea if the fulltext search yields with many rows anyhow.

MySQL requires FORCE INDEX on huge table and simple SELECTs

You might check your value for the innodb_stats_sample_pages parameter. It controls how many index dives MySQL performs on a table when updating index statistics, which in turn are used to calculate the cost of a candidate join plan. The default value was 8 for the version we were using. We changed it to 128, and observed fewer unexpected join plans.

Best Answer

Related Solutions

Mysql – why/how does the number of matched columns influences the way of excecuting a query

MySQL requires FORCE INDEX on huge table and simple SELECTs

Related Question