SQL Server – Why Query Runtime Increases Dramatically When Made into Subquery

sql serversubquery

I'm working on a query that selects counts from 2 servers and joins those together. Here is what I have:

SELECT
    a.NAME
    , SUM(ISNULL(b.ttl,0)) AS [Total]
FROM TableFunction(@user, DEFAULT) a
-- Selecting NAME from Table1 takes <1s
LEFT JOIN (
    -- Sub-Query 1
    SELECT
        COUNT(*) ttl
        , NAME
    FROM Table2 c
    WHERE active=1
    GROUP BY NAME
    -- Sub-Query 1 (< 1s, 662 rows)
    UNION ALL
    -- Sub-Query 2
    SELECT
        COUNT(*) ttl
        , jt.NAME
    FROM OtherServer.dbase.dbo.Table3 d
    INNER JOIN OtherServer.dbase.dbo.JoiningTable jt ON d.ID=jt.ID
    WHERE ISNULL(d.DELETED,0)=0
    GROUP BY jt.NAME
    -- Sub-Query 2 (7s, 5576 rows)
    -- Full Sub-Query [1 UNION ALL 2] (7s, 6238 rows)
) b ON b.NAME=a.NAME
GROUP BY a.NAME
--Full query (3:44, 91 rows)

The Execution plan suggests that the problem comes from the remote query (Sub-Grid 2).

Execution plan stats

How can I improve the runtime of this? Why does a 7 second sub-query take over 3 minutes when joined on a basically instant query?

UPDATE:

After some commenting out random things in hopes of finding something useful, it appears to be specifically the WHERE ISNULL(d.DELETED,0)=0 bit. Removing that line improves query to 4s. Why would this help at all?

UPDATE 2:

Playing around more, it seems the fact that the first table is a table-valued function may have been important. Selecting it's result table into a temp table and using that temp table brings the query down to 8s. The table-valued function code is:

--Inputs
@user varchar(50)
, @includeDeleted BIT = 0

--Function
RETURNS @tbl table (ID int
  , NAME varchar(max)
  , DESCRIPTION varchar(max))
AS
BEGIN
INSERT INTO @tbl (ID, NAME, DESCRIPTION) (
    select jt.ID, jt.NAME, jt.DESCRIPTION
    from OtherServer.dbase.dbo.JoiningTable jt
    where jt.USER = (select top 1 USER
                            from OtherServer.dbase.dbo.USERS 
                            where user_name = @user)
    and 1=CASE WHEN @includeDeleted=1 THEN 1 ELSE 
        CASE WHEN ISNULL(jt.Deleted, 0) <> 1 THEN 1 ELSE 0 END
    END
)

INSERT INTO @tbl (ID, NAME, DESCRIPTION) (
    SELECT ID, name, ''
    FROM RandomTable rt
    INNER JOIN RandomJoinTable rjt ON rjt.ID = rt.ID AND rjt.user = @user
    LEFT JOIN @tbl t ON t.ID = rt.ID
    WHERE 1=CASE WHEN @includeDeleted=1 THEN 1 ELSE 
        CASE WHEN ISNULL(rt.Deleted,0)=0 THEN 1 ELSE 0 END
    END
    AND t.Name IS NULL
)

RETURN

Best Answer

Querying across databases introduces a lot of uncertainty to the query optimizer because it doesn't know much about the other end. For example, if you try to join two tables, it has no idea how big the other table is (10 rows? 1 million?), what is indexed, or how the data is organized. Should it try to use an index to get 10 rows or a full scan for 1 million?

Generally, what you want to do is minimize the amount of interaction or joining that needs to happen between the two. Is there a filter you can put on JoiningTable to reduce the number of rows? Can you structure it into two CTEs that only pull the results together at the end?