Sql-server – Aggregation in Outer Apply vs Left Join vs Derived table

execution-planperformancesql serversql-server-2012

Consider the following setup. There are three tables involved #CCP_DETAILS_TEMP,Period and ACTUALS_DETAILS

#CCP_DETAILS_TEMP will have 50000 records, ACTUALS_DETAILS can have 5000000 records and period table will have 2000 records

Index details:

CREATE UNIQUE CLUSTERED INDEX IX_CCP_DETAILS_TEMP
  ON #CCP_DETAILS_TEMP (CCP_DETAILS_SID)

CREATE NONCLUSTERED INDEX IXN_ACTUALS_DETAILS_PERIOD_SID_RS_MODEL_SID_CCP_DETAILS_SID_QUANTITY_INCLUSION
  ON ACTUALS_DETAILS (PERIOD_SID, CCP_DETAILS_SID, RS_MODEL_SID, QUANTITY_INCLUSION)
  INCLUDE( SALES, QUANTITY, DISCOUNT) 

CREATE UNIQUE CLUSTERED INDEX IX_PERIOD
  ON PERIOD (PERIOD_SID)

I have a requirement for which I wrote three different ways to achieve the result. Now I want to know which one is better.

All three queries are running more in more or less in same time. I need some experts advice on which one will perform better. Is there any disadvantage in any of the approach

Approach 1: Outer Apply

Time taken: 4615 Milli Seconds

SELECT c.CCP_DETAILS_SID,
       A.PERIOD_SID,
       SALES,
       QUANTITY
FROM   #CCP_DETAILS_TEMP c
       CROSS JOIN (SELECT PERIOD_SID
                   FROM   BPIGTN_GAL_APP_DEV_ARM..PERIOD
                   WHERE  PERIOD_SID BETWEEN 577 AND 624)A
       OUTER apply (SELECT Sum(SALES),
                           Sum(QUANTITY)
                    FROM   [DBO].[ACTUALS_DETAILS] ad
                    WHERE  a.PERIOD_SID = ad.PERIOD_SID
                           AND ad.CCP_DETAILS_SID = c.CCP_DETAILS_SID
                           AND QUANTITY_INCLUSION = 'Y') oa (sales, quantity)

Query statistics:

Table 'PERIOD'. Scan count 1, logical reads 2, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

Table '#CCP_DETAILS_TEMP'. Scan count 16, logical reads 688, physical
reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 16, logical reads 807232, physical reads
0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

Table 'ACTUALS_DETAILS'. Scan count 1200000, logical reads 3859053,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob
physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

SQL Server Execution Times: CPU time = 36796 ms, elapsed time =
4615 ms.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

Approach 2: Left Join

Time taken: 4293 Milli Seconds

SELECT c.CCP_DETAILS_SID,
       A.PERIOD_SID,
       Sum(SALES),
       Sum(QUANTITY)
FROM   #CCP_DETAILS_TEMP c
       CROSS JOIN (SELECT PERIOD_SID
                   FROM   BPIGTN_GAL_APP_DEV_ARM..PERIOD
                   WHERE  PERIOD_SID BETWEEN 577 AND 624) a
       LEFT JOIN [ACTUALS_DETAILS] ad
              ON a.PERIOD_SID = ad.PERIOD_SID
                 AND ad.CCP_DETAILS_SID = c.CCP_DETAILS_SID
                 AND QUANTITY_INCLUSION = 'Y'
GROUP  BY c.CCP_DETAILS_SID,
          A.PERIOD_SID

Query statistics:

Table 'ACTUALS_DETAILS'. Scan count 17, logical reads 37134, physical
reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.

Table 'PERIOD'. Scan count 1, logical reads 2, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

Table '#CCP_DETAILS_TEMP'. Scan count 16, logical reads 688, physical
reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 16, logical reads 807232, physical reads
0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

SQL Server Execution Times: CPU time = 7983 ms, elapsed time =
4293 ms.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

Approach 3: Aggregating first and Left join:

Time taken: 4200 Milli Seconds

SELECT c.CCP_DETAILS_SID,
       A.PERIOD_SID,
       SALES,
       QUANTITY
FROM   #CCP_DETAILS_TEMP c
       CROSS JOIN (SELECT PERIOD_SID
                   FROM   BPIGTN_GAL_APP_DEV_ARM..PERIOD
                   WHERE  PERIOD_SID BETWEEN 577 AND 624) a
       LEFT JOIN (SELECT CCP_DETAILS_SID,
                         PERIOD_SID,
                         Sum(SALES)    SALES,
                         Sum(QUANTITY) QUANTITY
                  FROM   [ACTUALS_DETAILS] ad
                  WHERE  QUANTITY_INCLUSION = 'Y'
                  GROUP  BY CCP_DETAILS_SID,
                            PERIOD_SID) ad
              ON a.PERIOD_SID = ad.PERIOD_SID
                 AND ad.CCP_DETAILS_SID = c.CCP_DETAILS_SID

Query statistics:

Table 'ACTUALS_DETAILS'. Scan count 17, logical reads 37134, physical
reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 16, logical reads 807232, physical reads
0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

Table 'PERIOD'. Scan count 1, logical reads 2, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

Table '#CCP_DETAILS_TEMP'. Scan count 16, logical reads 688, physical
reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads
0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.

SQL Server Execution Times: CPU time = 7731 ms, elapsed time =
4200 ms.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

Best Answer

For future questions please post the actual execution plans using Paste The Plan. I think I was able to reverse engineer all of the relevant details using the screenshots and your STATISTICS output but I may have gotten a few things wrong. It looks like your plans are running with a DOP of 16, about 50000 rows are returned from #CCP_DETAILS_TEMP, and 24 rows are returned from PERIOD.

In all three query plans the join between #CCP_DETAILS_TEMP and PERIOD is performed in the same way, has the same STATISTICS output, and serves as the outer table in the join to ACTUALS_DETAILS. It looks like SQL Server is doing the right thing for that join and it's not that interesting so I'll skip that part. It's irrelevant for your comparison.

What is relevant is the table access pattern on ACTUALS_DETAILS. All three queries use index seeks on your covering index but the index seeks are performed differently. In the first query, 1200000 seeks are performed using the PERIOD_SID and CCP_DETAILS_SID columns. In the second and third queries, 17 seeks are performed using just PERIOD_SID. I believe that all of the rows are fetched with PERIOD_SID BETWEEN 577 AND 624, so that index seek can effectively be thought of as an parallel index scan that starts with PERIOD_SID = 577 and ends with PERIOD_SID = 624. That results in a big difference in IO between the queries:

Table 'ACTUALS_DETAILS'. Scan count 1200000, logical reads 3859053, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'ACTUALS_DETAILS'. Scan count 17, logical reads 37134, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

There's a big benefit in not reading the same pages over and over again. While it's true that the pseudo-scan approach may technically read pages that aren't needed you do a lot less IO overall. I also believe that IO difference is directly responsible for the large difference in CPU time between the first query and the other two queries: 36796 ms vs 7731 ms. While the first query ran, it on average kept 9 CPUs fully busy compared to less than 2 busy CPUs for the second and third queries. That's a big disadvantage for the first query and you'd notice it on a busy system or if your queries were forced to run with lower DOP. In my limited experience with APPLY I've noticed that the SQL Server query optimizer tends to implement it as a nested loop join with index seeks. This should be considered anecdotal evidence and I'm sure there are exceptions but it explains what you're seeing here.

Queries 2 and 3 implement the join to ACTUALS_DETAILS as a hash join. I assume the idea behind pushing the GROUP BY into the ad derived table was so that SQL Server would perform the aggregation early and you would join to fewer rows and aggregate fewer rows. However, SQL Server rewrote your second query to perform the aggregation early anyway. You can tell because the stream aggregate and hash match operators are to the right of the hash match (right outer join) operator in the second plan. As far as I can tell the second and third query plans are effectively the same, although the third plan does have a few extra 0% cost operators.

Personally I would not consider the difference between 4293 and 4200 ms of elapsed time or 7983 and 7731 ms time of CPU time to be statistically significant. It's possible that if you ran the queries a few more times the second query might be faster than the third query. I would use whichever style of query feels more natural to you. Personally, I would use the third query because it better represents what I want the optimizer to do, which is to perform the aggregation as early as possible.

Subqueries in CASE expressions

Consider the following (perfectly legal) query:

DECLARE @Base AS TABLE (a integer NULL);
DECLARE @When AS TABLE (b integer NULL);
DECLARE @Then AS TABLE (c integer NULL);
DECLARE @Else AS TABLE (d integer NULL);

SELECT
    CASE
        WHEN (SELECT W.b FROM @When AS W) = 1
            THEN (SELECT T.c FROM @Then AS T)
        ELSE (SELECT E.d FROM @Else AS E)
    END
FROM @Base AS B;

The semantics of CASE are that WHEN/ELSE clauses are generally evaluated in textual order. In the query above, it would be incorrect for SQL Server to return an error if the ELSE subquery returned more than one row, if the WHEN clause was satisfied. To respect these semantics, the optimizer produces a plan that uses pass-through predicates:

Pass-through predicates

The inner side of the nested loop joins are only evaluated when the pass-through predicate returns false. The overall effect is that CASE expressions are tested in order, and subqueries are only evaluated if no previous expression was satisfied.

CASE expressions with an EXISTS subquery

Where a CASE subquery uses EXISTS, the logical existence test is implemented as a semi-join, but rows that would normally be rejected by the semi-join have to be retained in case a later clause needs them. Rows flowing through this special kind of semi-join acquire a flag to indicate if the semi-join found a match or not. This flag is known as the probe column.

The details of the implementation is that the logical subquery is replaced by a correlated join ('apply') with a probe column. The work is performed by a simplification rule in the query optimizer called RemoveSubqInPrj (remove subquery in projection). We can see the details using trace flag 8606:

SELECT
    T1.ID,
    CASE
        WHEN EXISTS 
        (
            SELECT 1
            FROM #T2 AS T2
            WHERE T2.ID = T1.ID
        ) THEN 1 
    ELSE 0
    END AS DoesExist
FROM #T1 AS T1
WHERE T1.ID BETWEEN 5000 AND 7000
OPTION (QUERYTRACEON 3604, QUERYTRACEON 8606);

Part of the input tree showing the EXISTS test is shown below:

ScaOp_Exists 
    LogOp_Project
        LogOp_Select
            LogOp_Get TBL: #T2
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier [T2].ID
                ScaOp_Identifier [T1].ID

This is transformed by RemoveSubqInPrj to a structure headed by:

LogOp_Apply (x_jtLeftSemi probe PROBE:COL: Expr1008)

This is the left semi-join apply with probe described previously. This initial transformation is the only one available in SQL Server query optimizers to date, and compilation will simply fail if this transformation is disabled.

One of the possible execution plan shapes for this query is a direct implementation of that logical structure:

NLJ Semi Join with Probe

The final Compute Scalar evaluates the result of the CASE expression using the probe column value:

Compute Scalar expression

The basic shape of the plan tree is preserved when the optimize considers other physical join types for the semi join. Only merge join supports a probe column, so a hash semi join, though logically possible, is not considered:

Merge with probe column

Notice the merge outputs an expression labelled Expr1008 (that the name is the same as before is a coincidence) though no definition for it appears on any operator in the plan. This is just the probe column again. As before, the final Compute Scalar uses this probe value to evaluate the CASE.

The problem is that the optimizer doesn't fully explore alternatives that only become worthwhile with merge (or hash) semi join. In the nested loops plan, there is no advantage to checking if rows in T2 match the range on every iteration. With a merge or hash plan, this could be a useful optimization.

If we add a matching BETWEEN predicate to T2 in the query, all that happens is that this check is performed for each row as a residual on the merge semi join (tough to spot in the execution plan, but it is there):

SELECT
    T1.ID,
    CASE
        WHEN EXISTS 
        (
            SELECT 1
            FROM #T2 AS T2
            WHERE T2.ID = T1.ID
            AND T2.ID BETWEEN 5000 AND 7000 -- New
        ) THEN 1 
    ELSE 0
    END AS DoesExist
FROM #T1 AS T1
WHERE T1.ID BETWEEN 5000 AND 7000;

Residual predicate

We would hope that the BETWEEN predicate would instead be pushed down to T2 resulting in a seek. Normally, the optimizer would consider doing this (even without the extra predicate in the query). It recognizes implied predicates (BETWEEN on T1 and the join predicate between T1 and T2 together imply the BETWEEN on T2) without them being present in the original query text. Unfortunately, the apply-probe pattern means this is not explored.

There are ways to write the query to produce seeks on both inputs to a merge semi join. One way involves writing the query in quite an unnatural way (defeating the reason I generally prefer EXISTS):

WITH T2 AS
(
    SELECT TOP (9223372036854775807) * 
    FROM #T2 AS T2 
    WHERE ID BETWEEN 5000 AND 7000
)
SELECT 
    T1.ID, 
    DoesExist = 
        CASE 
            WHEN EXISTS 
            (
                SELECT * FROM T2 
                WHERE T2.ID = T1.ID
            ) THEN 1 ELSE 0 END
FROM #T1 AS T1
WHERE T1.ID BETWEEN 5000 AND 7000;

TOP trick plan

I wouldn't be happy writing that query in a production environment, it's just to demonstrate that the desired plan shape is possible. If the real query you need to write uses CASE in this particular way, and performance suffers by there not being a seek on the probe side of a merge semi-join, you might consider writing the query using different syntax that produces the right results and a more efficient execution plan.

Sql-server – Which of these queries is best for performance

Sometimes I wonder if SHORT scripts really is the best thing to focus on.

The size of a script has little to do with how efficiently the query will execute. A more compact statement will likely consume fewer resources in terms of compilation, but (re)compilation is usually a rare occurrence in a live system.

Fewer table accesses is usually desirable, though, and this does lead to more compact code.

Very generally speaking, a smaller execution plan will yield better results, and a lower estimated cost will yield better results. Again, though, it's highly situational. Cost estimates in particular can be way off in some cases. It's important to measure the actual execution time, because at the end of the day, that's what matters.

With left joins i can achieve what i want with just a few lines. But then I tried with a longer script, using unions. Which is the best method?

First of all, we need to know how much data will be in these tables in a real system. Right now there's so little it will be difficult to use the STATISTICS TIME performance metrics to figure out a winner -- the results that come back will be dominated by factors other than the query execution. With more data, it's likely the plans will change, thus rendering the comparison here moot.

Having said that, by looking at the query plans as they are now from a logical point of view, the first one is the winner.

You can see that the Clustered Index Scan of quantities appears once in the first plan, while it appears four times in the second one. The second plan also contains an expensive Distinct Sort as a result of using UNIONs (this operator could be eliminated by using UNION ALLs instead, which won't change the results).

The first query could also probably be improved, by getting index seeks on the colors and sizes tables, instead of table scans. It might be worth trying a hash match plan as well (which is what you'll probably see when quantities and products are larger), but for tables this small, the startup cost may be too much overhead to be of benefit.

What I would suggest you do is run each of the statements you want to test 10,000+ times in a loop, figure out the average execution time, and then compare.

Best Answer

Related Solutions

Sql-server – Check existence with EXISTS outperform COUNT! … Not

Subqueries in CASE expressions

CASE expressions with an EXISTS subquery

Sql-server – Which of these queries is best for performance

Related Question