MySQL – Using Composite Indexes with Date Ranges for Performance

index-tuningMySQLperformancequery-performance

I have a table of rules that are activated depending on the date.

id - name - value - start_date - end_date
1  - ABC  - 10    - 2015-12-01 - 2015-12-31
2  - DEF  - 15    - 2016-02-01 - 2016-02-29

My SQL queries will mostly be

SELECT * FROM rules WHERE start_date <= '2015-12-05' and end_date >= '2015-12-05';

Does having a composite index consisting of start_date and end_date help or will worsen the query?

Edit: I'm using MySQL

Best Answer

It's hard to tell if this will hurt or improve performance but adding a composite index on start_date and end_date shouldn't worsen your query, the index will either be used or not used. You didn't mention your RDBMS but I don't think there's a big chance your engine's optimizer will pick a worse plan (but there is always a possibility I guess).

That being said, there might a slight impact on the plan generation time as your optimizer has an extra index to consider, and unused indexes could hurt overall performance since they need to be maintained when updating/inserting.

Whether or not the index will help your query is hard to tell, it depends on the selectivity of your query and cardinality statistics. The optimizer of your RDBMS should pick the fastest way to get your data, and even if an index exists on the columns your query selects on a full scan might still be the better option, especially since you are selecting all fields.

Why is selectivity important?

When an index is used, the index points to the actual record and the record might need to be fetched using the pointer. Most RDBMS work that way.

See for example the Oracle documentation

In general, index access paths should be used for statements that retrieve a small subset of table rows, while full scans are more efficient when accessing a large portion of the table

or the SQL Server documentation

In many cases, the optimizer forces a table scan for queries with result sets of approximately 5 percent, although the table scan becomes more efficient than index access at selectivities of 8 to 10 percent.

Why is select * important

If you fetch all fields from a table your database engine will have to fetch the actual record using the pointer stored in the index and you lose the option of using included columns (if your RDBMS supports them) or covering indexes.

See for example this article for SQL server (but the principle goes for a lot of vendors): Using Covering Indexes to Improve Query Performance

However, under some circumstances, the overhead associated with nonclustered indexes may be deemed too great by the query optimizer and SQL Server will resort to a table scan to resolve the query.

and

The observed improvement is due to the fact that the nonclustered index contained all of the required information to resolve the query. No Key Lookups were required.