Thesql innodb index: better one big or many compact

index-tuninginnodbmysql-8.0

I've a question for an expert in how innodb indexes are structured (mysql 8.0.18)

Say I have 4 varchar columns on a table with one billion rows:

country, state, city, attraction

I have queries that look for all or certain attractions, either by country, state, city or attraction name.

query1: "select * from table where attraction like 'asd%' and country = 'X'"  
query2: "select * from table where attraction like 'asd%' and country = 'X' and state = 'Y' and city = 'Z'"    
query3: "select distinct attraction from table where country='X'"     
query4: "select distinct attraction from table where attraction like 'Ux%'



Combined index: (attraction, country, state, city)  

This index would cover all 4 queries.

Can I expect similar performance on query1,3,4 in comparison to a specialized index ?

Specialized index1:  (attraction, country)  
Specialized index2:  (attraction)  

I don't have the time to dive into the details of innodb storage, I hope someone did that already 😉

My main thoughts on this:

  1. More indexes will need more memory and storage (given a billion rows quite a bit), so that's a concern.

  2. If an index that's made for 4 columns is called on a query that only needs one (the first) column or on two (the first two) columns is the data access sequential and as effective as when having small dedicated indexes (that basically contain duplicate data)?

So should I have one index, covering the WHERE requirements of all 4 queries or 3 indexes, each dedicated to the query it serves ?

Best Answer

danblack's post answers the main question about the best index strategy for you queries.

However, I would add a sometimes forgotten optimization of the index strategy, which is implemented in most recent versions of RDBMs (MySQL, MariaDB, PostgreSQL...): covering indexes

Definition of covering index: (from MySQL documentation)

An index that includes all the columns retrieved by a query. Instead of using the index values as pointers to find the full table rows, the query returns values from the index structure, saving disk I/O

That means that your third query:

select distinct attraction from table where country='X'

would benefit more from a (country, attraction) index than a simple (country) index.