SQL Server – IN Clause Performance Difference Between Listing Values and Query

execution-planoptimizationperformancesql server

I changed this query

SELECT ...
FROM linkedServer.DB.Schema.Table1 t1
LEFT JOIN linkedServer.DB.Schema.Table2 t2 ON t1.ORDER_ID = t2.ORDER_ID
WHERE t1.BRANCH_ID NOT IN (
        '009991', '009992', '009993', '009994', '009995', '009996', '009999', '900001', 
        '900002', '900003', '900004', '900005', '900006', '900007', '900008', '999991', 
        '999992', '999993', '999994', '999995'
        )
GROUP BY ...

into this

SELECT ...
FROM linkedServer.DB.Schema.Table1 t1
LEFT JOIN linkedServer.DB.Schema.Table2 t2 ON t1.ORDER_ID = t2.ORDER_ID
WHERE t1.BRANCH_ID NOT IN (
        SELECT b.BRANCH_ID
        FROM TB_BRANCH b --25 rows in total
        WHERE b.START_DT = '99999999' --the result of this sub-query is fewer than list above.
        )
GROUP BY ...

then performance dropped.

enter image description here

Why is this huge change of Execution plan happening? How can I avoid performance drop while using a sub-query in IN clause?

Best Answer

You are using a linked server to access Table1 and Table2.

The first query is sent as is to the other server and executed there returning only the rows you want.

The second query is doing a join between a local table TB_BRANCH and a remote table Table1. To do that it fetches all rows from Table1 and all rows from Table2 to your local server and does the join operations locally.

The drop in performance is there because it takes time to move the entire contents of Table1 and Table2 from the remote server to the local server.

How can I avoid performance drop while using a sub-query in IN clause?

Move the content of TB_BRANCH (25 rows) to the remote server to avoid doing a join between tables on different servers.

One way of moving the rows from TB_BRANCH to the remote server is to execute the query on the remote server using sp_executesql with the values as a parameter in a XML structure. Unpack the XML to a table variable and use the table variable in your main query.

declare @X varchar(max);
set @X = (
         select b.BRANCH_ID
         from dbo.TB_BRANCH as b
         where b.START_DT = '99999999'
         for xml path('')
         );

-- Create a SQL Statement
declare @SQL nvarchar(max);
set @SQL = '
declare @B table(BRANCH_ID varchar(10) primary key);

insert into @B(BRANCH_ID)
select B.X.value(''.'', ''varchar(10)'')
from (select cast(@X as xml)) as T(X)
  cross apply T.X.nodes(''/BRANCH_ID/text()'') as B(X);

select *
from Schema.Table1 as t1
  left outer join Schema.Table2 as t2
    on t1.ORDER_ID = T2.ORDER_ID
where t1.BRANCH_ID not in (
                          select B.BRANCH_ID
                          from @B as B
                          );';

exec linkedServer.DB.Schema.sp_executesql @SQL, N'@X varchar(max)', @X

I have guessed the data type of BRANCH_ID to be varchar(10) you need to modify that in two places in the script if it is something else.

The XML parameter has to be a varchar(max) instead of as XML since the XML datatype is not supported with remote queries. Table Valued Parameters is not supported either so that is not an option here.

The query plan of this query is not very interesting. It will just be a Remote Scan. If you want to know what the plan looks like on the remote side you have to capture the plan there.