Sql-server – comparing left join and outer apply doing the same thing

join;sql serversql-server-2012t-sqltrigger

I have an update trigger on a table that that insert rows in another database/table

the query look like

insert into otherdb.table
select manyfield
from inserted
  left join (select id, 
                    timestamp, 
                    row_number() over (partition by id order by timestamp desc) as rownum
             from sometable) as t1
      on inserted.id = t1.id and
         rownum = 1

sometable have over 7 millions rows and have proper indexing

while doing execution plan i can see that if there is 1 updated row, that windowing fuction (proper name used?) retrieve the whole table do to the rows numbering and get executed once, it is using index scan

same(it will retrieve once the 7 millions rows) if inserted have 20 rows or inserted have 50,000 rows

now the usage of that trigger is mostly 1 row update (we do have some odd case where it will be 20-100 rows and even thousands)

I refactored this query to look like

insert into otherdb.table
select manyfield
from inserted
  outer apply join (select top 1 id, 
                    timestamp
                    from sometable
                    where inserted.id = sometable.id
                    order by timestamp desc) as t1

same result at the end but the execution plan change.

now it is using index seek and it will return 1 row per inserted row

when i do a huge update of 50,000 rows, the execution plan will tell me number of execution 50,000 instead of one when doing the left join version.

to resume everything;

left join: 1 big operation being reused by all inserted row

outer apply: 1 small operation being executed once per inserted row

my question is, at this point i don't know enough about execution plan to decide which kind of join i should keep, left join or outer apply?

this is being used by 500-1000 users at the same time and we have timeout error maybe related to this trigger, we currently use the left join query

EDIT

result of some actual execution plan, select on 55017 rows

left join:

estimated subtree cost: ~78
memory grant: ~156k
estimated number of rows: ~57k

outer apply:

estimated subtree cost: ~99
memory grant: ~163k
estimated number of rows: ~55k

hybrid(solution of Paparazzi but using innner join, for left join see above):

estimated subtree cost: ~87
memory grant: ~170k
estimated number of rows: ~56k

result of some actual execution plan, update on 55017 rows (trigger)

left join:

estimated subtree cost: ~401
memory grant: ~455k
estimated number of rows: ~225k

outer apply:

estimated subtree cost: ~136
memory grant: ~153k
estimated number of rows: ~52k

hybrid(solution of Paparazzi but using innner join, for left join see above):

estimated subtree cost: ~126
memory grant: ~156k
estimated number of rows: ~53k

Best Answer

try this - you might get the best of both

select manyfield 
from  ( select manyfield
             , row_number() over (partition by t1.id order by t1.timestamp desc) as rownum 
          from inserted 
          left join sometable t1 
                 on inserted.id = t1.id
      ) tt
where rownum = 1 or t1.id is null

Related Solutions

Sql-server – Understanding below execution plan

1. How to understand estimated operator cost? Tb1 which don't have index is scanned and cost is 2 %, whereas index is being used on tb2 and cost is 98%.

Plan

The heap table is only fully scanned once, but the index seek is executed 1,000,000 times. The optimizer estimates that a million seeks in this case will represent 98.4% of the total cost of executing the query, whereas a single parallel scan of the heap table will represent 0.9% of the cost.

These are just estimates used for internal plan choice reasons; they do not generally reflect real-world performance on modern hardware, and are never anything more than an estimate - even in a post-execution ("actual") execution plan.

In Management Studio:

Seek tooltip

In SQL Sentry Plan Explorer:

Plan Explorer tooltip

2. From the above snip of table scan (whose cost is 2%),number of executions are 24,will that means sql read rows in batches and stored in memory and for each row it did a seek operation from tbl2.

No, it means 24 parallel threads co-operated to perform a single scan of the heap table. Each thread still reads a row at a time from the scan, performs a seek into the indexed table, then gets the next row from the scan, and so on until the task is complete.

Rows are not read in batches and stored in memory in this plan. SQL Server reports 24 scans because 24 threads each performed a partial scan of the table, resulting in one full scan overall.

3. Also any pointers to know more about force scan,force index when I pressed F4 after clicking an operator

The ForceScan, ForceSeek, and ForcedIndex properties are set to true if the query specifies a FORCESCAN, FORCESEEK, or INDEX hint - or if the query optimizer decides that a particular access strategy is required for correctness (for example, when checking foreign key constraints).

Hint properties

Sql-server – Why SELECT COUNT() query execution plan includes left joined table

If ForeignId, ForeignTable, IsMain is not known* to be unique in ExternFile, then the QO will need to include that table to work out the count. Any time multiple rows match, the count will be affected.

Join Simplification in SQL Server
Designing for simplification (SQLBits recording)

_{* The optimizer does not currently recognize filtered unique indexes as unique}

UPDATE (by OP): The solution is to change line in query from LEFT JOIN (which can produce multiple rows):

LEFT JOIN ExternFile ON realty.Id = ExternFile.ForeignId AND ExternFile.IsMain = 1 AND ExternFile.ForeignTable = 5

to OUTER APPLY with TOP (which produce one row and does not affect COUNT)

OUTER APPLY (SELECT TOP (1) ServerPath FROM ExternFile WHERE ForeignId = realty.Id AND IsMain = 1 AND ForeignTable = 5) AS ExternFile

The query is now more effective. Adding a unique index could not be done, because values weren't unique, they were unique only for combination in the condition and this is not considered as unique as mentioned above.

Best Answer

Related Solutions

Sql-server – Understanding below execution plan

Sql-server – Why SELECT COUNT() query execution plan includes left joined table

Related Question