I have a 'joinings' table which lists the joining date of all the employees along with their office location, department no and the job advertisement through which they were selected. I want to be able to query this table by their joining dates and group them using one of the other columns. I also want to be able to apply a where clause on any of the columns.
It could be like this –
SELECT month(joining_date) as month, count(*) as entries
, branch_name
FROM joinings
WHERE joining_date >= '2017-01-01'
AND joining_date <= '2017-05-31'
AND department_no in (5,4,7)
AND source_of_appliction IN ('glassdoor', 'linkedin')
GROUP BY month(joining_date), branch_name
OR could be like this –
SELECT year(joining_date) as month, count(*) as entries
, source_of_application
FROM joinings
WHERE joining_date >= '2017-01-01'
AND joining_date <= '2019-12-31'
AND department_no IN (5,4,7)
AND branch_name IN ('ZEIT','DUS')
GROUP BY year(joining_date), source_of_application
This table can contain thousands of records for several years. The range of the joining date that I would query could be between a single month, range of months or range of years.
I would like to know what indexes need to be created to get me optimized performance for my select queries. If not the exact indexes, I am at least looking out for pointers to get me started to create the right indices.
What I have-
I currently have a multi-column index created on joinings(joining_date, department_no)
, joinings(joining_date, branch_name)
, joinings(joining_date, source_of_application)
and also have individual indexes for all the single columns. But my select performs a full table scan for the queries that I have listed above.
Best Answer
This is a "range:
joining_date >= '2017-01-01' AND joining_date <= '2017-05-31'
This is sort of a range:
department_no IN (5,4,7)
. There are cases where it acts efficiently like=
, there are cases where it should be considered a "range".This is an
=
:department_no IN (5)
. That is, the Optimizer turns it intodepartment_no = 5
, which has much more optimization potential.http://mysql.rjweb.org/doc.php/index_cookbook_mysql says to put range columns last, not first in an index.
In your queries, it is hard to predict what is optimal.
If you are likely to sometimes have single-item
INs
, then you needAlso, newer versions of MySQL may work well with that index for the multi-item
IN
by leapfrogging across the table. So add that index.As for
When using a 'range', that index will not get past
joining_date
to whatever other column(s) follow it. So you may as say justAs for the
GROUP BY
... Neither of your queries has any hope of using an index for theGROUP BY
. So there will be some form of extra effort after theWHERE
-- a tmp table, a sort, something.month(joining_date)
-- the Optimizer can't use thejoining_date
column an any index for this.WHERE
left the results in some messed up order (assuming a multi-itemIN
)Unrelated... I like this pattern:
It avoids computing the end, leap year, DATE vs DATETIME vs DATETIME(6) issues, etc.
Bottom line: Have this one index:
Then get the
EXPLAIN SELECT ...
to see if it did the leapfrogging. If that fails, let's rethink.