Mysql – Indexes involved in intervals collision detection

index-tuningMySQL

Let's assume an hotel booking application wants to display all customers being present between two dates.

Starting with a venue table :

CREATE TABLE `venue` (
  `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `startdate` DATE NOT NULL,
  `enddate`   DATE NOT NULL
)

I wish to perform a query finding venues for a one month interval, e.g. last december 2013:

SELECT `venue`.id 
WHERE `startdate` <= '2013-12-31'
AND   `enddate`   >= '2013-12-01'

In order to have great performance among millions of rows (big hotel), I immediately add two indexes, one for each date:

ALTER TABLE `venue` 
ADD INDEX (`startdate`),
ADD INDEX (`enddate`  )

I leave to the SQL engine the responsibility to choose the right index. If dates intervals are distributed homogeneously, from say 2000-01-01 to 2014-01-01, the previous query will immediately use the enddate index, only scanning the last month. If the interval is closer to the beginning, e.g. last week of 2000, the startdate index will be used, scanning only the whole 2000 year.

Now, what if I want to get the venues for a single day, right in the middle of the venues distribution, e.g. first january of 2007. Both indexes will more or less offer the same filtering, that is half the table. And scanning half the table is very, very long.

Adding paired index like (startdate, enddate) or (enddate, startdate) won't be of any used, as far as I can tell.

Is there any other index-based strategy to make things faster ?

Best Answer

This is exactly the kind of query that can benefit from a composite, or multiple-column index: when a single statement filters by multiple columns in the same table. The order of columns in the index will depend on your use cases, but in general the order in which the columns appear in the DML statement is fine, in your case (assuming MySQL but the concept is the same for most RDBMSs):

ALTER TABLE 'venue'
ADD INDEX ('startdate','enddate')

See the documentation for other alternatives, such as adding a column containing the hash of multiple columns.