I'm working on a query that selects counts from 2 servers and joins those together. Here is what I have:
SELECT
a.NAME
, SUM(ISNULL(b.ttl,0)) AS [Total]
FROM TableFunction(@user, DEFAULT) a
-- Selecting NAME from Table1 takes <1s
LEFT JOIN (
-- Sub-Query 1
SELECT
COUNT(*) ttl
, NAME
FROM Table2 c
WHERE active=1
GROUP BY NAME
-- Sub-Query 1 (< 1s, 662 rows)
UNION ALL
-- Sub-Query 2
SELECT
COUNT(*) ttl
, jt.NAME
FROM OtherServer.dbase.dbo.Table3 d
INNER JOIN OtherServer.dbase.dbo.JoiningTable jt ON d.ID=jt.ID
WHERE ISNULL(d.DELETED,0)=0
GROUP BY jt.NAME
-- Sub-Query 2 (7s, 5576 rows)
-- Full Sub-Query [1 UNION ALL 2] (7s, 6238 rows)
) b ON b.NAME=a.NAME
GROUP BY a.NAME
--Full query (3:44, 91 rows)
The Execution plan suggests that the problem comes from the remote query (Sub-Grid 2).
How can I improve the runtime of this? Why does a 7 second sub-query take over 3 minutes when joined on a basically instant query?
UPDATE:
After some commenting out random things in hopes of finding something useful, it appears to be specifically the WHERE ISNULL(d.DELETED,0)=0
bit. Removing that line improves query to 4s. Why would this help at all?
UPDATE 2:
Playing around more, it seems the fact that the first table is a table-valued function may have been important. Selecting it's result table into a temp table and using that temp table brings the query down to 8s. The table-valued function code is:
--Inputs
@user varchar(50)
, @includeDeleted BIT = 0
--Function
RETURNS @tbl table (ID int
, NAME varchar(max)
, DESCRIPTION varchar(max))
AS
BEGIN
INSERT INTO @tbl (ID, NAME, DESCRIPTION) (
select jt.ID, jt.NAME, jt.DESCRIPTION
from OtherServer.dbase.dbo.JoiningTable jt
where jt.USER = (select top 1 USER
from OtherServer.dbase.dbo.USERS
where user_name = @user)
and 1=CASE WHEN @includeDeleted=1 THEN 1 ELSE
CASE WHEN ISNULL(jt.Deleted, 0) <> 1 THEN 1 ELSE 0 END
END
)
INSERT INTO @tbl (ID, NAME, DESCRIPTION) (
SELECT ID, name, ''
FROM RandomTable rt
INNER JOIN RandomJoinTable rjt ON rjt.ID = rt.ID AND rjt.user = @user
LEFT JOIN @tbl t ON t.ID = rt.ID
WHERE 1=CASE WHEN @includeDeleted=1 THEN 1 ELSE
CASE WHEN ISNULL(rt.Deleted,0)=0 THEN 1 ELSE 0 END
END
AND t.Name IS NULL
)
RETURN
Best Answer
Querying across databases introduces a lot of uncertainty to the query optimizer because it doesn't know much about the other end. For example, if you try to join two tables, it has no idea how big the other table is (10 rows? 1 million?), what is indexed, or how the data is organized. Should it try to use an index to get 10 rows or a full scan for 1 million?
Generally, what you want to do is minimize the amount of interaction or joining that needs to happen between the two. Is there a filter you can put on JoiningTable to reduce the number of rows? Can you structure it into two CTEs that only pull the results together at the end?