I think that what you need to use is a Nonequi join
select *
from
table_a,
table_b
where
table_a.id1 = table_b.id1
and table_a.id2 = table_b.id2
and table_a.evnt_sec BETWEEN table_b.evnt_sec -2 and table_b.evnt_sec +2
I am going to assume that there isn't an index on the date columns, otherwise I think that the query would have been structured differently. If there is, you can probably find a better performing one than this.
The advantage of this query is that it can get all the data in one scan. The disadvantage is that it has to sort the data and join EventEmployee
on the entire table. So as always, test with your own situation. This query also assumes that the MAX
date is either unique or that equivalent rows would be acceptable.
USE AdventureWorks2012
GO
;
WITH Base AS (
SELECT
TransactionHistory.*
,ProductVendor.BusinessEntityID
,MAX(CASE WHEN TransactionDate < '2008-08-01' THEN TransactionDate END)
OVER (PARTITION BY ProductVendor.BusinessEntityID) AS PreviousVendorTransaction
,COUNT(CASE WHEN TransactionDate >= '2008-08-01' THEN 1 END )
OVER (PARTITION BY ProductVendor.BusinessEntityID) AS VendorAfterCutoff
FROM
Production.TransactionHistory
-- Doesn't make the most sense, but I need a repeating relation
INNER JOIN Purchasing.ProductVendor
ON TransactionHistory.ProductID = ProductVendor.ProductID
),
Filtered AS (
SELECT
*
FROM
Base
WHERE
Base.TransactionDate >= '2008-08-01'
OR (TransactionDate = PreviousVendorTransaction AND VendorAfterCutoff > 0)
)
SELECT DISTINCT
TransactionID
,ProductID
,ReferenceOrderID
,ReferenceOrderLineID
,TransactionDate
,TransactionType
,Quantity
,ActualCost
,ModifiedDate
FROM
Filtered
Edit:
Hmm, I think I may have to take back my comment on structuring it differently if there are indexes. The other suggestions that I have are probably fairly minor.
- Make sure the query is using the indexes you're expecting it to. Start and End date to build temp table, end date to drive the previous event loop.
- If the query to build the temp table is doing a lookup on the clustered index, it may be better to hold off and do that as part of the main query.
- Try using a cte instead of a temp table. I think that a cte might be more competitive with the way that the query is structured below.
- If you are returning a lot of events, it might be better to pull out the event table lookup to the main query to give the optimizer the option of doing a merge join.
- I don't see a way of optimizing the previous event lookup short of an indexed view.
Here's a query that combines a few of those ideas.
SELECT
e.[EventID]
INTO #EventTemp
FROM
[Events] AS e
WHERE
( e.[EventStart] >= @StartDate AND e.[EventStart] <= @EndDate )
OR ( e.[EventEnd] >= @StartDate AND e.[EventEnd] <= @EndDate )
;
WITH PrevEvent AS (
SELECT
EmpPrevEvent.[EventID]
FROM
(
SELECT DISTINCT
ee.[EmployeeID]
FROM
#EventTemp
INNER JOIN [EventEmployee] AS ee ON
#EventTemp.[EventID] = ee.[EventID]
) AS Emp
CROSS APPLY (
SELECT TOP 1
e.[EventID]
FROM
[Events] AS e
INNER JOIN [EventEmployee] AS ee ON
e.[EventID] = ee.[EventID]
WHERE
ee.[EmployeeID] = Emp.[EmployeeID]
AND e.[EventEnd] < @StartDate
ORDER BY
e.[EventEnd] DESC
) AS EmpPrevEvent
)
SELECT
e.[EventID],
e.[EventStart],
e.[EventEnd],
e.[EventTypeID]
FROM
[Events] AS e
WHERE
e.EventID IN (
SELECT EventID
FROM #EventTemp
UNION
SELECT EventID
FROM PrevEvent
)
Best Answer
This is a hard problem to solve in general, but there are a couple of things we can do to help the optimizer choose a plan. This script creates a table with 10,000 rows with a known pseudo-random distribution of rows to illustrate:
The first question is how to index this table. One option is to provide two indexes on the
DATETIME
columns, so the optimizer can at least choose whether to seek onStartDate
orEndDate
.Naturally, the inequalities on both
StartDate
andEndDate
mean that only one column in each index can support a seek in the example query, but this is about the best we can do. We might consider making the second column in each index anINCLUDE
rather than a key, but we might have other queries that can perform an equality seek on the leading column and an inequality seek on the second column. Also, we may get better statistics this way. Anyway...This query uses variables, so in general the optimizer will guess at selectivity and distribution, resulting in a guessed cardinality estimate of 81 rows. In fact, the query produces 2076 rows, a discrepancy that might be important in a more complex example.
On SQL Server 2008 SP1 CU5 or later (or R2 RTM CU1) we can take advantage of the Parameter Embedding Optimization to get better estimates, simply by adding
OPTION (RECOMPILE)
to theSELECT
query above. This causes a compilation just before the batch executes, allowing SQL Server to 'see' the real parameter values and optimize for those. With this change, the estimate improves to 468 rows (though you do need to check the runtime plan to see this). This estimate is better than 81 rows, but still not all that close. The modelling extensions enabled by trace flag 2301 may help in some cases, but not with this query.The problem is where the rows qualified by the two range searches overlap. One of the simplifying assumptions made in the optimizer's costing and cardinality estimation component is that predicates are independent (so if both have a selectivity of 50%, the result of applying both is assumed to qualify 50% of 50% = 25% of the rows). Where this sort of correlation is a problem, we can often work around it with multi-column and/or filtered statistics. With two ranges with unknown start and end points, this becomes impractical. This is where we sometimes have to resort to rewriting the query to a form that happens to produce a better estimate:
This form happens to produce a runtime estimate of 2110 rows (versus 2076 actual). Unless you have TF 2301 on, in which case the more advanced modelling techniques see through the trick and produce exactly the same estimate as before: 468 rows.
One day SQL Server might gain native support for intervals. If that comes with good statistical support, developers might dread tuning query plans like this a little less.