Sql-server – Outer Apply vs Left Join Performance

cross-applyjoin;sql server

I am Using SQL SERVER 2008 R2

I just came across APPLY in SQL and loved how it solves query problems for so many cases,

Many of the tables I was using 2 left join to get the result, I was able to get in 1 outer apply.

I have small amount of data in my local DB tables and after deployment the code is supposed to run on data atleast 20 times big.

I am concerned that outer apply might take longer than the 2 left join conditions for large amount of data,

Can anyone tell how exactly apply works and how will it effect the performance in very large data,
If possible some proportional relations with size of each table like proportional to n1^1 or n1^2 … where n1 is number of rows in table 1.

Here is the query with 2 left join

select EC.*,DPD.* from Table1 eC left join
  (
   select member_id,parent_gid,child_gid,LOB,group_gid,MAX(table2_sid) mdsid from Table2
   group by member_id,parent_gid,child_gid,LOB,group_gid

  ) DPD2 on DPD2.parent_gid = Ec.parent_gid
        AND DPD2.child_gid = EC.child_gid
        AND DPD2.member_id = EC.member_id
        AND DPD2.LOB = EC.default_lob
        AND DPD2.group_gid = EC.group_gid
  left join
  Table2 dpd on dpd.parent_gid = dpd2.parent_gid 
            and dpd.child_gid = dpd2.child_gid
            and dpd.member_id = dpd2.member_id 
            and dpd.group_gid = dpd2.group_gid 
            and dpd.LOB = dpd2.LOB
            and dpd.table2_sid = dpd2.mdsid

Here is the query with outer apply

select * from Table1 ec   
OUTER APPLY (
      select top 1 grace_begin_date,retroactive_begin_date,Isretroactive
                    from Table2 DPD 
                    where DPD.parent_gid = Ec.parent_gid
                    AND DPD.child_gid = EC.child_gid
                    AND DPD.member_id = EC.member_id
                    AND DPD.LOB = EC.default_lob
                    AND DPD.group_gid = EC.group_gid
                    order by DPD.table2_sid desc
     ) DPD

Best Answer

Can anyone tell how exactly apply works and how will it effect the performance in very large data

APPLY is a correlated join (called a LATERAL JOIN in some products and newer versions of the SQL Standard). Like any logical construction, it has no direct impact on performance. In principle, we should be able to write a query using any logically equivalent syntax, and the optimizer would transform our input into exactly the same physical execution plan.

Of course, this would require the optimizer to know every possible transformation, and to have the time to consider each one. This process might well take longer than the current age of the universe, so most commercial products do not take this approach. Therefore, query syntax can, and often does, have an impact on final performance, though it is difficult to make general statements about which is better and why.

The specific form of OUTER APPLY ( SELECT TOP ... ) is most likely to result in a correlated nested loops join in current versions of SQL Server, because the optimizer does not contain logic to transform this pattern to an equivalent JOIN. Correlated nested loops join may not perform well if the outer input is large, and the inner input is unindexed, or the pages needed are not already in memory. In addition, specific elements of the optimizer's cost model mean a correlated nested loops join is less likely than a semantically-identical JOIN to produce a parallel execution plan.

I was able to make same query with single left join and row_number()

This may or may not be better in the general case. You will need to performance test both alternatives with representative data. The LEFT JOIN and ROW_NUMBER certainly has potential to be more efficient, but it depends on the precise query plan shape chosen. The primary factors that affect the efficiency of this approach is the availability of an index to cover the columns needed, and to supply the order needed by the PARTITION BY and ORDER BY clauses. A second factor is the size of the table. An efficient and well-indexed APPLY can out-perform a ROW_NUMBER with optimal indexing if the query touches a relatively small portion of the table concerned. Testing is needed.

Related Solutions

MySQL: LEFT OUTER JOIN within reason

First consider a query that computes which rows are actually relevant from tablethree. With the assumption that with "most recently entered result" you mean "most recent enddate" the following query would gather the appropriate rows:

SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid

Now you can build a join to retrieve not only sid, but all of the data of tablethree:

SELECT a.*
FROM tablethree a
INNER JOIN (
  SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
) b
ON a.sid = b.sid AND a.enddate = b.enddate

This is the result set you actually want to "left join in". You have to insert this into your original query:

SELECT t1.*
FROM tableone AS t1
INNER JOIN tabletwo AS t2
  ON t1.cid = t2.id
LEFT OUTER JOIN (
  SELECT a.*
  FROM tablethree a
  INNER JOIN (
    SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
  ) b
  ON a.sid = b.sid AND a.enddate = b.enddate
) AS t3
  ON t3.sid = t2.sid
WHERE t1.fieldone = 1 
  AND t1.odate NOT BETWEEN t3.startdate AND t3.enddate

What should also work is the following:

SELECT t1.*
FROM tableone AS t1
INNER JOIN tabletwo AS t2
  ON t1.cid = t2.id
LEFT OUTER JOIN tablethree AS t3
  ON t3.sid = t2.sid
LEFT OUTER JOIN (
  SELECT sid, MAX(enddate) FROM `tablethree` GROUP BY sid
) mostrecent
  ON t3.sid = mostrecent.sid AND t3.enddate = mostrecent.enddate

WHERE t1.fieldone = 1 
  AND t1.odate NOT BETWEEN t3.startdate AND t3.enddate
  AND mostrecent.enddate IS NULL

This includes both tablethree and the new SELECT as left joins, and sorts out the rows where mostrecent.enddate IS NULL (meaning those rows which are actually not most recent). This should lead to the same result, but MySQL may be able to compute this result a little faster. EXPLAIN on both queries should reveal possible differences in computation.

Postgresql – Merge left outer join column

Perhaps you are looking for this:

select issues.id, array_agg(journal.notes) from issues 
left outer join journal on (issues.id = journal.issue_id and journal.notes != '')
group by issues.id

Please check this http://sqlfiddle.com/#!1/24db9/2

Best Answer

Related Solutions

MySQL: LEFT OUTER JOIN within reason

Postgresql – Merge left outer join column

Related Question