SQL Server – How to Convert Left Join Resulting in Multiple Rows to Separate Columns

casejoin;sql serversql-server-2012

I need some logic help with a complex join situation. Here is an overall idea of the 3 tables I am working with

Table 1:

ID Number Type 1
A1
A2

Table 2:

ID Number Type 1	ID Number Type 2
A1	B1
A1	B2
A2	B3
A2	B4

Table 3:

ID Number Type 2	Designation
B1	D 1
B2	D 2
B3	D 1
B4	D 2

I am trying to get this result:

ID Number Type 1	ID Number Type 2 for D 1	ID Number Type 2 for D 2
A1	B1	B3
A2	B2	B4

Basically table 2 results in multiple rows, and I want these multiple resulting ID Type 2’s to show in two separate columns in the resulting table, based on the designation that comes from a third table.

Currently, I have table 1 and table 2, as well as table 2 and table 3, joined with left joins. I am using Case When logic, but it is only resulting in one of the two designation columns getting filled for obvious reasons. Here is my current code:

Case when (Designation = D1) THEN (ID Type 2) END AS [D1]

Case when (Designation = D2) THEN (ID Type 2) END AS [D2]

Can anyone help improve my logic? thank you in advance!

Best Answer

Your "Table 2" mapping won't help you for this task, because it holds non-useful association between A values and B values (A1 is matched with B1 and B2. A2 is matched with B3 and B4).

What you can do here is generate an ad-hoc remapping with ranking values using three window functions:

ROW_NUMBER on table 1, to identify the order between A1 and A2
DENSE_RANK on table 3, ordering by Designation, to identify the order between designation values

These two will help you generate the association you need: they'll be useful for your join condition betweeen table 1 and 3.

ROW_NUMBER on table 3, partitioning by Designation and ordering by ID2, to identify the ordering of rows for each D1 and D2 designations

This one will make you pivot work correctly, selecting the right values you need for each column.

WITH t1 AS (
    SELECT *, 
           ROW_NUMBER() OVER(ORDER BY [ID Number Type 1]) AS rn1
    FROM tab1
), t3 AS (
    SELECT *, 
           ROW_NUMBER() OVER(PARTITION BY Designation 
                             ORDER     BY [ID Number Type 2]) AS rn,
           DENSE_RANK() OVER(ORDER BY Designation) AS rn3
    FROM tab3
) 
SELECT [ID Number Type 1],
       MAX(CASE WHEN rn = 1 THEN [ID Number Type 2] END) AS [ID Number Type 2 for D 1], 
       MAX(CASE WHEN rn = 2 THEN [ID Number Type 2] END) AS [ID Number Type 2 for D 2]
FROM       t1
INNER JOIN t3
        ON t1.rn1 = t3.rn3
GROUP BY [ID Number Type 1]

Check the demo here.

Related Solutions

Sql-server – SQL Server: join two tables with “valid-to” dates

DECLARE @t1 TABLE (T1_key int, T1_Data char(1), T1_ValidUntil datetime)
DECLARE @t2 TABLE (T2_key int, T2_Data char(1), T2_ValidUntil datetime)

INSERT @t1 VALUES (1, 'A', '2000-01-01')
INSERT @t1 VALUES (1, 'B', '2000-06-30')
INSERT @t1 VALUES (2, 'C', '2005-05-31')
INSERT @t1 VALUES (3, 'D', '2004-12-31')
INSERT @t1 VALUES (3, 'E', '2007-04-30')
INSERT @t1 VALUES (3, 'F', '2008-01-31')

INSERT @t2 VALUES (1, 'R', '2002-03-31')
INSERT @t2 VALUES (2, 'S', '2001-06-30')
INSERT @t2 VALUES (2, 'T', '2003-02-28')
INSERT @t2 VALUES (2, 'U', '2005-05-31')
INSERT @t2 VALUES (3, 'V', '2006-09-30')
INSERT @t2 VALUES (3, 'W', '2007-06-30')

SELECT
   T1.*, T2x.*
FROM
   @t1 T1
   CROSS APPLY
   (SELECT TOP 1*
   FROM @t2 
   WHERE T1_key = T2_key AND T2_ValidUntil >= T1_ValidUntil
   ORDER BY T2_ValidUntil
   ) T2x
UNION
SELECT
   T1x.*, T2.*
FROM
   @t2 T2
   CROSS APPLY
   (SELECT TOP 1*
   FROM @t1
   WHERE T1_key = T2_key AND T1_ValidUntil >= T2_ValidUntil
   ORDER BY T1_ValidUntil
   ) T1x

SQL Server Performance – Row Estimates Change with Join Hint

From reading different articles and books, I assumed that the cardinality estimations are performed before the plan is built.

Not exactly. An initial cardinality estimate is derived (after simplifications and other work), which influences the initial join order chosen by the optimizer.

However, subsequent explorations (during cost-based optimization) can, and often do, result in new cardinality estimations being calculated. These later CEs may be more or less 'accurate'. If an under-estimate results, the optimizer may choose a plan that looks cheaper, but in fact runs for much longer.

In general, there is no guarantee that cardinality estimations for semantically identical subtrees will produce the same results. It is a statistical process, after all, and some operations have deeper CE support than others.

In your case, there appears to be another factor - the optimizer introduces (or moves around) a Top, which sets a row goal on the subtree beneath it:

If you were to enable trace flag 4138 (on 2008 R2 or later), you may find the estimates more in-line with expectations, or perhaps even that the optimizer would no longer choose nested loops.

What I see, however, is that Merge hint causes all estimations to become pretty much perfect.

There is an element of luck involved here. People tend to write queries, or at least the joins, in the order they expect them to be performed physically. Using a join hint comes with an implied FORCE ORDER, thereby fixing the join order to match the textual form, and switching off many optimizer exploration rules that can lead to cardinality re-estimation.

Besides that, I tried running the query with OPTION (FORCE ORDER) - it runs even faster than using merge join, choosing HASH MATCH for every join.

This is the same as hinting a join, but does not restrict the choice of physical join operator. Again, if you happened to write the query join order in a logical way, it is quite likely you will get a reasonable plan. Of course, you miss out on much of the optimizer's abilities this way, which may not produce optimal results in more general situations.

You will probably not want to use FORCE ORDER very often because it is an extremely powerful hint (directive) that has wider effects than simple forcing the order of joins; for example, it prevents the optimizer moving aggregates around and introducing partial aggregations. I very much advise against using this hint except in very exceptional circumstances, and by truly expert tuners.

A detailed analysis would require more time than I have right now, and access to a statistics-only copy of the database.

Best Answer

Related Solutions

Sql-server – SQL Server: join two tables with “valid-to” dates

SQL Server Performance – Row Estimates Change with Join Hint

Related Question