DECLARE @t1 TABLE (T1_key int, T1_Data char(1), T1_ValidUntil datetime)
DECLARE @t2 TABLE (T2_key int, T2_Data char(1), T2_ValidUntil datetime)
INSERT @t1 VALUES (1, 'A', '2000-01-01')
INSERT @t1 VALUES (1, 'B', '2000-06-30')
INSERT @t1 VALUES (2, 'C', '2005-05-31')
INSERT @t1 VALUES (3, 'D', '2004-12-31')
INSERT @t1 VALUES (3, 'E', '2007-04-30')
INSERT @t1 VALUES (3, 'F', '2008-01-31')
INSERT @t2 VALUES (1, 'R', '2002-03-31')
INSERT @t2 VALUES (2, 'S', '2001-06-30')
INSERT @t2 VALUES (2, 'T', '2003-02-28')
INSERT @t2 VALUES (2, 'U', '2005-05-31')
INSERT @t2 VALUES (3, 'V', '2006-09-30')
INSERT @t2 VALUES (3, 'W', '2007-06-30')
SELECT
T1.*, T2x.*
FROM
@t1 T1
CROSS APPLY
(SELECT TOP 1*
FROM @t2
WHERE T1_key = T2_key AND T2_ValidUntil >= T1_ValidUntil
ORDER BY T2_ValidUntil
) T2x
UNION
SELECT
T1x.*, T2.*
FROM
@t2 T2
CROSS APPLY
(SELECT TOP 1*
FROM @t1
WHERE T1_key = T2_key AND T1_ValidUntil >= T2_ValidUntil
ORDER BY T1_ValidUntil
) T1x
From reading different articles and books, I assumed that the cardinality estimations are performed before the plan is built.
Not exactly. An initial cardinality estimate is derived (after simplifications and other work), which influences the initial join order chosen by the optimizer.
However, subsequent explorations (during cost-based optimization) can, and often do, result in new cardinality estimations being calculated. These later CEs may be more or less 'accurate'. If an under-estimate results, the optimizer may choose a plan that looks cheaper, but in fact runs for much longer.
In general, there is no guarantee that cardinality estimations for semantically identical subtrees will produce the same results. It is a statistical process, after all, and some operations have deeper CE support than others.
In your case, there appears to be another factor - the optimizer introduces (or moves around) a Top, which sets a row goal on the subtree beneath it:
If you were to enable trace flag 4138 (on 2008 R2 or later), you may find the estimates more in-line with expectations, or perhaps even that the optimizer would no longer choose nested loops.
What I see, however, is that Merge hint causes all estimations to become pretty much perfect.
There is an element of luck involved here. People tend to write queries, or at least the joins, in the order they expect them to be performed physically. Using a join hint comes with an implied FORCE ORDER
, thereby fixing the join order to match the textual form, and switching off many optimizer exploration rules that can lead to cardinality re-estimation.
Besides that, I tried running the query with OPTION (FORCE ORDER)
- it runs even faster than using merge join, choosing HASH MATCH for every join.
This is the same as hinting a join, but does not restrict the choice of physical join operator. Again, if you happened to write the query join order in a logical way, it is quite likely you will get a reasonable plan. Of course, you miss out on much of the optimizer's abilities this way, which may not produce optimal results in more general situations.
You will probably not want to use FORCE ORDER
very often because it is an extremely powerful hint (directive) that has wider effects than simple forcing the order of joins; for example, it prevents the optimizer moving aggregates around and introducing partial aggregations. I very much advise against using this hint except in very exceptional circumstances, and by truly expert tuners.
A detailed analysis would require more time than I have right now, and access to a statistics-only copy of the database.
Best Answer
Your "Table 2" mapping won't help you for this task, because it holds non-useful association between A values and B values (A1 is matched with B1 and B2. A2 is matched with B3 and B4).
What you can do here is generate an ad-hoc remapping with ranking values using three window functions:
ROW_NUMBER
on table 1, to identify the order between A1 and A2DENSE_RANK
on table 3, ordering by Designation, to identify the order between designation valuesThese two will help you generate the association you need: they'll be useful for your join condition betweeen table 1 and 3.
ROW_NUMBER
on table 3, partitioning by Designation and ordering by ID2, to identify the ordering of rows for each D1 and D2 designationsThis one will make you pivot work correctly, selecting the right values you need for each column.
Check the demo here.