SQL Server – How to Optimize Slow CROSS APPLY Queries

cross-applyexecution-plansortingsql servertop

I am trying to optimize the query to run faster.
The query is the following:

SELECT grp_fk_obj_id, grp_name
FROM tbl_groups as g1
         CROSS APPLY (SELECT TOP 1 grp_id as gid
                      FROM tbl_groups as g2
                      WHERE g1.grp_fk_obj_id = g2.grp_fk_obj_id
                      ORDER BY g2.date_from DESC, ISNULL(date_to, '4000-01-01') DESC) as a
WHERE g1.grp_id = gid

grp_id is a primary key. grp_fk_obj_id is a foreign key to another object. There are indexes on both of these columns (I guess it comes as default).

It takes about half a second to complete but I need it to make work faster. I took a look at the execution plan and it shows that "Top N sort" has a cost of more than 90%. Also, I have noticed that if I remove a where clause inside the cross apply then it runs at least 5x faster, but I need that where clause in one way or another.

Do you see any possibilities to improve the performance of this query?

EDIT:
table creation DDL:

create table tbl_groups
(
    grp_id        bigint identity
        constraint PK_tbl_groups
            primary key,
    grp_fk_obj_id bigint      not null
        constraint FK_grp_fk_obj_id
            references tbl_other,
    grp_name      varchar(30) not null,
    date_from     date        not null,
    date_to       date
)
go

create index IDX_grp_fk_obj_id
    on tbl_groups (grp_fk_obj_id)
go

create index IDX_grp_name
    on tbl_groups (grp_name)
go

Best Answer

You need better indexing, and a bit of a rewrite.

Because you’re only using one table, you can use ROW_NUMBER() in a sub-query and then grab the top row from each partition.

SELECT grp_fk_obj_id, grp_name, grp_id
FROM (
    SELECT grp_fk_obj_id, grp_name, grp_id, ROW_NUMBER() OVER (PARTITION BY grp_fk_obj_id ORDER BY date_from DESC, ISNULL(date_to, '4000-01-01') DESC) AS rownum
    FROM tbl_groups
) g
WHERE g.rownum = 1;

But also, fix your ORDER BY clause, because it forces it to do the Sort. Because you’re ordering on an expression, you can’t index it effectively. If you can force date_to to have a value, order by date_from desc, date_to desc, then an index on (grp_fk_obj_id, date_from, date_to) include (grp_name) will really help. It would even make your original query run way quicker.

Related Solutions

Sql-server – SQL Select taking too much time to execute

I think in a million records query, you have to avoid things like OUTER JOINS. I suggest you use UNION ALL Instead of LEFT JOIN. As long as I think CROSS APPLY is more efficient than sub-query in the select clause I will modify the query written by Conard Frix, which I think is correct.

now: when I started to modify your query I noticed that you have a WHERE clause saying: JoinedTable.WhereColumn IN (1, 3). in this case, if the field is null the condition will become false. then why are you using LEFT JOIN while you are filtering null valued rows? just replace LEFT JOIN With INNER JOIN, I guarantee that it will become faster.

about INDEX:

please note that when you have an index on a table, say

table1(a int, b nvarchar)

and your index is :

nonclustered index ix1 on table1(a)

and you want to do something like this:

select a,b from table1
where a < 10

in your index you have not included the column b so what happens?

if sql-server uses your index, it will have to search in the index, called "Index Seek" and then refer to main table to get column b, called "Look Up". This procedure might take much longer than scanning the table itself: "Table Scan".

but based on the statistics that sql-server has, in such situations, it might not use your index at all.

so first of all check the Execution Plan to see if the index is used at all.

if yes or no both, alter your index to include all columns that you are selecting. say like:

nonclustered index ix1 on table1(a) include(b)

in this case Look Up will not be needed, and your query will execute so much faster.

SQL Server Sorting – Sort Order in Primary Key Yet Sorting Executed on SELECT

For a non partitioned table I get the following plan

Plan 1

There is a single seek predicate on Seek Keys[1]: Prefix: DeviceId, SensorId = (3819, 53), Start: Date < 1339225010.

Meaning that SQL Server can perform an equality seek on the first two columns and then begin a range seek starting at 1339225010 and ordered FORWARD (as the index is defined with [Date] DESC)

The TOP operator will stop requesting more rows from the seek after the first row is emitted.

When I create the partition scheme and function

CREATE PARTITION FUNCTION PF (int)
AS RANGE LEFT FOR VALUES (1000, 1339225009 ,1339225010 , 1339225011);
GO
CREATE PARTITION SCHEME [MyPartitioningScheme]
AS PARTITION PF
ALL TO ([PRIMARY] );

And populate the table with the following data

INSERT INTO [dbo].[SensorValues]    
/*500 rows matching date and SensorId, DeviceId predicate*/
SELECT TOP (500) 3819,53,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0))           
FROM master..spt_values
UNION ALL
/*700 rows matching date but not SensorId, DeviceId predicate*/
SELECT TOP (700) 3819,52,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0))           
FROM master..spt_values
UNION ALL 
/*1100 rows matching SensorId, DeviceId predicate but not date */
SELECT TOP (1100) 3819,53,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0)) + 1339225011      
FROM master..spt_values

The plan on SQL Server 2008 looks as follows.

Plan 2

The actual number of rows emitted from the seek is 500. The plan shows seek predicates

Seek Keys[1]: Start: PtnId1000 <= 2, End: PtnId1000 >= 1, 
Seek Keys[2]: Prefix: DeviceId, SensorId = (3819, 53), Start: Date < 1339225010

Indicating it is using the skip scan approach described here

the query optimizer is extended so that a seek or scan operation with one condition can be done on PartitionID (as the logical leading column) and possibly other index key columns, and then a second-level seek, with a different condition, can be done on one or more additional columns, for each distinct value that meets the qualification for the first-level seek operation.

This plan is a serial plan and so for the specific query you have it seems that if SQL Server ensured that it processed the partitions in descending order of date that the original plan with the TOP would still work and it could stop processing after the first matching row was found rather than continuing on and outputting the remaining 499 matches.

In fact the plan on 2005 looks like it does take that approach

Plan on 2005

I'm not sure if it is straight forward to get the same plan on 2008 or maybe it would need an OUTER APPLY on sys.partition_range_values to simulate it.

Best Answer

Related Solutions

Sql-server – SQL Select taking too much time to execute

SQL Server Sorting – Sort Order in Primary Key Yet Sorting Executed on SELECT

Related Question