I'm beginning to learn some about looking at execution plans and making queries more efficient
Consider these two basic queries
select distinct pat_id, drug_class, drug_name from rx
select pat_id, drug_class, drug_name from rx
and their execution plans
index being used:
CREATE CLUSTERED INDEX [ix_overlap] ON [dbo].[rx]
(
[pat_id] ASC,
[fill_date] ASC,
[script_end_date] ASC,
[drug_name] ASC
)
Even though the first query supposedly has the higher cost by a 4:1 margin it runs faster than the second one. Why is it that a simple distinct added to the query will add the (what I assume to always be bad, corrections are welcome) hash match operator? And why does it have the higher query cost relative to the second query if it runs faster.
Best Answer
The first query is using a parallel plan, meaning the "work" was split into multiple tasks carried out by multiple threads. The cumulative CPU time was therefore higher than for the serial plan used for your second query.
As to why the distinct causes the hash match operator to appear in the plan; an aggregate or sort operation is required to determine the
DISTINCT
result. @SQL_Kiwi might pop up with a more indepth explanation shortly but the hash match operator is apparently favoured for larger result sets.