Mysql – Index for date column along with multiple other columns in where clause

indexMySQL

I have a 'joinings' table which lists the joining date of all the employees along with their office location, department no and the job advertisement through which they were selected. I want to be able to query this table by their joining dates and group them using one of the other columns. I also want to be able to apply a where clause on any of the columns.

It could be like this –

SELECT month(joining_date) as month, count(*) as entries
     , branch_name  
FROM joinings  
WHERE joining_date >= '2017-01-01' 
  AND joining_date <= '2017-05-31'
  AND department_no in (5,4,7) 
  AND source_of_appliction IN ('glassdoor', 'linkedin')  
GROUP BY month(joining_date), branch_name

OR could be like this –

SELECT year(joining_date) as month, count(*) as entries
     , source_of_application  
FROM joinings  
WHERE joining_date >= '2017-01-01' 
  AND joining_date <= '2019-12-31' 
  AND department_no IN (5,4,7) 
  AND branch_name IN ('ZEIT','DUS')  
GROUP BY year(joining_date), source_of_application

This table can contain thousands of records for several years. The range of the joining date that I would query could be between a single month, range of months or range of years.

I would like to know what indexes need to be created to get me optimized performance for my select queries. If not the exact indexes, I am at least looking out for pointers to get me started to create the right indices.

What I have-
I currently have a multi-column index created on joinings(joining_date, department_no), joinings(joining_date, branch_name), joinings(joining_date, source_of_application) and also have individual indexes for all the single columns. But my select performs a full table scan for the queries that I have listed above.

Best Answer

This is a "range: joining_date >= '2017-01-01' AND joining_date <= '2017-05-31'

This is sort of a range: department_no IN (5,4,7) . There are cases where it acts efficiently like =, there are cases where it should be considered a "range".

This is an =: department_no IN (5) . That is, the Optimizer turns it into department_no = 5, which has much more optimization potential.

http://mysql.rjweb.org/doc.php/index_cookbook_mysql says to put range columns last, not first in an index.

In your queries, it is hard to predict what is optimal.

If you are likely to sometimes have single-item INs, then you need

INDEX(department_no, joining_date) -- in THIS order

Also, newer versions of MySQL may work well with that index for the multi-item IN by leapfrogging across the table. So add that index.

As for

INDEX(joining_date, ...)

When using a 'range', that index will not get past joining_date to whatever other column(s) follow it. So you may as say just

INDEX(joining_date)

As for the GROUP BY... Neither of your queries has any hope of using an index for the GROUP BY. So there will be some form of extra effort after the WHERE -- a tmp table, a sort, something.

  • Hiding a column in a function month(joining_date) -- the Optimizer can't use the joining_date column an any index for this.
  • The WHERE left the results in some messed up order (assuming a multi-item IN)

Unrelated... I like this pattern:

AND joining_date >= '2017-01-01' 
AND joining_date  < '2017-01-01' + INTERVAL 1 MONTH

It avoids computing the end, leap year, DATE vs DATETIME vs DATETIME(6) issues, etc.

Bottom line: Have this one index:

INDEX(branch_name, department_no, joining_date)

Then get the EXPLAIN SELECT ... to see if it did the leapfrogging. If that fails, let's rethink.