SQL Server Performance – Optimizing MIN in Subquery

optimizationperformancequery-performancesql serversql-server-2005subquery

is there any effective way to tune the query below?

I have a filtered index on OrganisationID <> 0
Primary key is on another field, not relevant for this query ItemCode varchar(20).

I have tagged sql server 2005 because it needs to work there, but if there is anything else in any other sql server version, please bring it up.

--------------------------------------
-- START CLEAR
--------------------------------------

begin try
drop table #T
end try
begin catch
end catch

--------------------------------------
-- THE TABLE
--------------------------------------

CREATE TABLE #T
(
a1 int,
a2 int,
ID INT,
OrganisationID INT,
Distance INT,
constraint PKt1 PRIMARY KEY  CLUSTERED (A1,A2,ID)
)

--------------------------------------
-- ADDING SOME DATA
--------------------------------------

INSERT INTO #T
SELECT 1,1,0,10,100
UNION ALL
SELECT 1,1,1,10,200
UNION ALL
SELECT 1,1,3,10,50
UNION ALL
SELECT 1,1,4,20,80
UNION ALL
SELECT 1,2,5,20,300
UNION ALL
SELECT 1,2,6,0,100
UNION ALL
SELECT 1,3,7,0,100
UNION ALL
SELECT 1,3,4,10,100

GO
--------------------------------------
-- INDEX CREATION
--------------------------------------

create index idx_T_OrganisationID 
on #T (id)
WHERE OrganisationID <> 0
GO

--------------------------------------
-- THE QUERY
--------------------------------------
--SET STATISTICS IO ON
--SET STATISTICS TIME ON

    SELECT
        PID.ID,
        PID.OrganisationID
    FROM #t AS PID 
    WHERE id IN (
        SELECT MIN(id) FROM dbo.#t 
        WHERE OrganisationID <> 0
        GROUP BY OrganisationID
    )

enter image description here

After Comprehensive Testing – The Results

out of these 3 queries:

    -- query 1

        SELECT
            PID.ID,
            PID.OrganisationID
        FROM #t AS PID 
        WHERE id IN (
            SELECT MIN(id) FROM dbo.#t 
            WHERE OrganisationID <> 0
            GROUP BY OrganisationID
        )


    -- query 2

        SELECT  rn.ID, --... other columns go here
            rn.OrganisationID
    FROM (
        SELECT *, n = ROW_NUMBER() OVER(PARTITION BY OrganisationID ORDER BY id)
    FROM #t 
    ) rn WHERE n= 1
           AND OrganisationID <> 0


    -- query 3
    SELECT OrganisationID, MIN(ID) 
FROM #t T 
WHERE OrganisationID <> 0 
GROUP BY OrganisationID ;

the first one does not even bring the most correct results,
the query 3 (as suggested in the comments by spaghettidba and ypercubeᵀᴹ ) is the one with best performance in my live environment, with my real table and data, as you can see below, the query I had originally and the one based on query 3:

enter image description here

for this exercise in particular, using the table #T query 2 and query 3 perfom more of less equally as per the query plan below (picture):

enter image description here

Best Answer

This query usually perform better:

SELECT  rn.ID, --... other columns go here
        rn.OrganisationID
FROM (
    SELECT *, n = ROW_NUMBER() OVER(PARTITION BY OrganisationID ORDER BY id)
FROM #t 
) rn WHERE n= 1