I need to find a single value from a table A
containing three foreign keys to two other tables B
and C
.
For the sake of experiment, I tested two ways to query the value:
Multiple queries:
declare @start int = (select top 1 [Id] from [B] where [Day] = '2015-01-01')
declare @end int = (select top 1 [Id] from [B] where [Day] = '2017-06-14')
declare @category int = (select top 1 [Id] from [C] where [Title] = 'Hello, World!')
select top 1 [Name]
from [A]
where [StartId] = @start
and [EndId] = @end
and [CategoryId] = @category
and [Day] = '2016-05-27'
Single query:
select top 1 [Name]
from [A]
left join [B] as [BStart] on [BStart].[Id] = [A].[StartId]
left join [B] as [BEnd] on [BEnd].[Id] = [A].[EndId]
left join [C] on [C].[Id] = [A].[CategoryId]
where [BStart].[Day] = '2015-01-01'
and [BEnd].[Day] = '2017-06-14'
and [C].[Title] = 'Hello, World!'
and [A].[Day] = '2016-05-27'
I was surprised that the execution plan indicates that the single query is more expensive than multiple queries. When doing all five select
s together, the one with left join
s indicates 53%. The other four queries indicate 12% each.
Those are the execution plans:
declare @start int = (select top 1 [Id] from [B] where [Day] = '2015-01-01')`
declare @end int = (select top 1 [Id] from [B] where [Day] = '2017-06-14')
(Same as below)
declare @category int = (select top 1 [Id] from [C] where [Title] = 'Hello, World!')
select top 1 [Name]
from [A]
where [StartId] = @start
and [EndId] = @end
and [CategoryId] = @category
and [Day] = '2016-05-27'
select top 1 [Name]
from [A]
left join [B] as [BStart] on [BStart].[Id] = [A].[StartId]
left join [B] as [BEnd] on [BEnd].[Id] = [A].[EndId]
left join [C] on [C].[Id] = [A].[CategoryId]
where [BStart].[Day] = '2015-01-01'
and [BEnd].[Day] = '2017-06-14'
and [C].[Title] = 'Hello, World!'
and [A].[Day] = '2016-05-27'
Why is the single query with left join
s slower than the first approach?
Best Answer
Your execution plan for the "individual queries" shows that pre-calculating the StartID, Category etc allows an index to be used efficiently on A, "seeking" straight to the record(s) you want (you have a Non-Clustered Index Seek in your query plan), having identified the given Category etc to search for.
The "single-query" with JOINs, on the other hand, needs to carry out the join between table A-B-C "on the fly" before returning the "top 1" result matching your criteria. This join will involve all the records of those tables where they match to the other related table (full clustered index scan).
If you are frequently querying by Day and Title like this, it looks like an index is missing (or statistics out of date) that allow table B and C to be searched by those criteria.
By the way, Top 1 will only give a deterministic result if you "order by" some criteria (or if your search statement is unique) - otherwise you will get the 1st row the query comes to, which may or may not be consistent between occasions when you run it.