SQL Server – Pagination Performance with Subquery, Inner Join, and Where

performancequery-performancesql serversql-server-2012

I am trying to optimize following query (this is most simplified version I could came up with):

SELECT tr.Id, StatusDate
FROM (
    SELECT tr.Id, tr.StatusDate
    FROM mon.ArchivedTaskResults_201504 as tr WITH (NOLOCK)
    INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
    WHERE tr.ClientId = 4 AND dev.Deleted = 0
) AS tr        
ORDER BY StatusDate DESC
OFFSET 1000000 rows
FETCH NEXT 25 ROWS ONLY

simple query execution plan

The problem is, query performance is directly proportional to OFFSET – for offset=0 query executes in 0.0s, but for offset=1000000 execution time is about 23s (and with even greater offset, it can take up to few minutes).

I'm almost sure that my problem could be solved with appropriate clustered index on ArchivedTaskResults table, but after trying for few hours I still haven't found good index yet.

ArchivedTaskResults tables are really big, having about 50000000 rows (50 M)

Additional information:

I would be extremally happy if someone could solve problem I described above, but to be honest, my REAL query is even more bizarre (disclaimer: I am not the one that designed this database):

SELECT tr.Id, StatusDate
FROM (
    (
        SELECT tr.Id, StatusDate
        FROM mon.TaskResults as tr WITH (NOLOCK)
        INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
        WHERE tr.ClientId = 4 AND dev.Deleted = 0
    ) UNION ALL (
        SELECT tr.Id, tr.StatusDate
        FROM mon.ArchivedTaskResults_201504 as tr WITH (NOLOCK)
        INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
        WHERE tr.ClientId = 4 AND dev.Deleted = 0
    ) UNION ALL (
        SELECT tr.Id, tr.StatusDate
        FROM mon.ArchivedTaskResults_201505 as tr WITH (NOLOCK)
        INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
        WHERE tr.ClientId = 4 AND dev.Deleted = 0
    )
) AS tr        
ORDER BY StatusDate DESC
OFFSET 1000000 ROWS
FETCH NEXT 25 ROWS

complex query execution plan

This is much more complex, because even queries with offset=0 have long execution time (I guess its because sql server have no idea that ArchivedTaskResults_201505 comes 'after' ArchivedTaskResults_201504, and tries to sort them in memory, but it's only my blind shot).

It would be beyond my wildest dreams if someone managed to help me with THAT query, but if it's impossible because of strange database design, I guess I could solve it with software (I could query tables one at a time from my application, instead of doing everything in SQL. Or using stored procedure for that) – but only if my first query could be improved.

Thanks in advance.

Best Answer

I don't think you'll be able to get good performance while using OFFSET. The database must search through 1,000,025 rows of output from the inner query; even if you have a good clustered index on TaskResults the system doesn't know for certain that it can skip ahead to date X.

But you do! Assuming this is for some kind of GUI, make a note of the earliest StatusDate from the previous query, then use it to fitler next page:

SELECT
    tr.Id, StatusDate
FROM
    (
    SELECT tr.Id, tr.StatusDate
    FROM mon.ArchivedTaskResults_201504 as tr WITH (NOLOCK)
        INNER JOIN mon.ViewDevicesWithGroups dev WITH (NOLOCK) ON tr.DeviceId = dev.Id
    WHERE tr.ClientId = 4 AND dev.Deleted = 0
        AND
            (
            -- Retrieve only records from before the previous page
            tr.StatusDate < @PrevStatusDate
            OR (tr.StatusDate = @PrevStatusDate AND tr.Id < @PrevID) 
            )
    ) AS tr        
ORDER BY StatusDate, Id DESC
FETCH NEXT 25 ROWS ONLY

So if page #123 ends with 2015/05/01, record #234, you want to consider all records that are from 2015/04/30 or earlier, or which are also from 2015/05/01 but are for records #1 .. #233.

This should work well with your more complex UNION query, but "real" partitioning would probably be easier than this roll-yer-own partitioning..

If StatusDate is unique, or it's acceptable to occasionally show the same record on two adjacent pages, you can drop the @PrevID and ORDER BY Id bits. If Id is always-increasing, you can filter off of it and skip StatusDate.

Keep in mind that retrieving pages like this can easily skip a record or include the same record twice if records are being adding, removed, or reordered in the underlying data. But that's another topic.

Related Solutions

Mysql – INNER JOIN between 2 tables vs WHERE on one table performance

The second query. But why are you comparing them?

They may return the same data but they are answering two different questions. The first retrieves all records from doctors where a record exists in clinic with id=1. The second retrieves all records from doctors with clinic_id=1 regardless of whether a record exists in clinic.

In the absence of a foreign key constraint between the two tables, the queries are not comparable.

SQL Server Performance – Improving Inner Join Performance Using Dates and Between

Much more efficient to do this without having to go back and join to the periods table.

DECLARE @StartDate DATE, @EndDate DATE;

Select @StartDate = Min(StartDate), @EndDate = MAX(EndDate) 
from dbo.PeriodCalendar_Weeks pcw
where (pcw.Year = @Year and pcw.Period < @Period) 
  or  (pcw.Year = @Year and pcw.Period = @Period and pcw.Week <= @Week) 
  or (pcw.Year = @Year -1 and pcw.Period >= @Period);

SELECT 
  WeekEndDate = DATEADD(DAY, 6, DATEADD(WEEK, SalesWeek, @StartDate)), 
  Store, 
  DeliveryChargesTotal = dct
FROM 
(
  SELECT DATEDIFF(DAY, @StartDate, SalesDate)/7, Store, SUM(DeliveryChargesTotal)
  FROM dbo.Daily_GC_Headers
  WHERE SalesDate BETWEEN @StartDate AND @EndDate AND isCanceled = 0
  GROUP BY DATEDIFF(DAY, @StartDate, SalesDate)/7, Store
) AS x (SalesWeek, Store, dct)
ORDER BY WeekEndDate, Store;

A filtered index may help, if many rows exist where isCanceled = 1 (these are just possible suggestions, depending on cardinality of Store, and may not be the most optimal):

CREATE INDEX x ON dbo.Daily_GC_Headers
  (SalesDate) INCLUDE (Store, DeliveryChargesTotal)
  WHERE isCanceled = 0;

If there are very few rows where isCanceled = 1, this may be better:

CREATE INDEX x ON dbo.Daily_GC_Headers
  (SalesDate, IsCanceled) INCLUDE (Store, DeliveryChargesTotal);

Both are worth trying on a test system, as well as moving Store into the key in either case, or moving IsCanceled to the INCLUDE list in the latter case. On my system, I found the best results with everything but the date in the INCLUDE list:

CREATE INDEX x ON dbo.Daily_GC_Headers
  (SalesDate) INCLUDE (Store, IsCanceled, DeliveryChargesTotal);

Again, you will need to test if any of these work out, or if the query above gives a different/better recommendation directly from SQL Server.

Best Answer

Related Solutions

Mysql – INNER JOIN between 2 tables vs WHERE on one table performance

SQL Server Performance – Improving Inner Join Performance Using Dates and Between

Related Question