Sql-server – OFFSET X ROWS FETCH NEXT 50 ROWS ONLY not using indexes

azure-sql-databaseoffset-fetchsql server

I have a query that is using OFFSET X ROWS FETCH NEXT 50 ROWS ONLY. If X is 160745 it uses the index, if X is 160746 it does not. There are more than 400000 records. With the index, it takes about 2 seconds to run, without the index it takes ~30 seconds.

We are using Azure S0

I know I can use query hints, however is there another option? Is there something in Azure that I can set to improve this? Any ideas would be appreciated.

**Note: assume all columns are no bigger than nvarchar(255) and only 6 columns.

Below is the query:

SELECT *
FROM [TableName]
ORDER BY [ColumnName] ASC
OFFSET 456450 ROWS FETCH NEXT 50 ROWS ONLY

Best Answer

It's hard to say without seeing query plans, the table structure, or the index definitions on the table. You should make sure that the index is covering the query. Other than that, you can encourage SQL Server to use the index even without an explicit hint to use the index. For example, if you are confident that the query should always use the index you could try setting a small rowgoal with a hint:

DECLARE @row_goal INT = 456450 

SELECT *
FROM [TableName]
ORDER BY [ColumnName] ASC
OFFSET @row_goal ROWS FETCH NEXT 50 ROWS ONLY
OPTION (OPTIMIZE FOR (@row_goal = 1));

If you want to avoid hints entirely I recommend rewriting the query. Using OFFSET with large values requires special care. It's easy to end up with a plan that scans lots of rows and does a lot of unnecessary work just to get the next 50.

If your application pages through the data one page at a time you can consider the technique described here. The basic idea is that the application keeps track of the last value that the end user has seen. When you need a new page you can pass down the last value as a filter:

SELECT *
FROM [TableName]
WHERE [ColumnName] > ?
ORDER BY [ColumnName] ASC
OFFSET 456450 ROWS FETCH NEXT 50 ROWS ONLY;

If you need to be able to show an arbitrary 50 pages you can use the approach detailed here by Aaron Bertrand. With CTEs you can encourage the query optimizer to scan through a narrow index until it reaches the first row that you want. The query will get slower as you increase the OFFSET but it could perform much better than what you're doing now. The basic idea would be like this:

;WITH pg AS 
(
  SELECT [key_column] 
  FROM [TableName]
  ORDER BY [ColumnName] ASC
  OFFSET @PageSize * (@PageNumber - 1) ROWS
  FETCH NEXT @PageSize ROWS ONLY
)
SELECT t.*
  FROM [TableName] AS t
  INNER JOIN pg ON t.[key_column] = pg.[key_column] -- or EXISTS
  ORDER BY [ColumnName] ASC;

Finally, Paul White also writes about another way to efficient page through data here that does not use OFFSET at all.

With the right indexes I expect that at least one of the techniques described in this answer will work well for you.

Related Solutions

MySQL – How to Speed Up ORDER BY and LIMIT

This composite index may help:

INDEX(account_id, lastname)

When adding it, you may as well remove INDEX(account_id), since it will then be redundant.

This may be even better, but I am not sure:

INDEX(deleted, account_id, lastname)

However it does not supersede INDEX(account_id).

Sql-server – MSSQL efficient paging including a join on big tables

You didn't describe the problem very precisely so I made some guesses about table structures and data volume. I put 3 million rows in both tables:

CREATE TABLE dbo.ASSORTMENT (
    ID_ROW BIGINT NOT NULL,
    clientID INT NOT NULL,
    productID BIGINT NOT NULL,
    FILLER VARCHAR(2000) NOT NULL,
    PRIMARY KEY (ID_ROW)
);

INSERT INTO dbo.ASSORTMENT WITH (TABLOCK)
SELECT t.RN, 30 + t.RN % 15, t.RN, REPLICATE('Z', 2000)
FROM
(
    SELECT TOP (3000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) t (RN)
OPTION (MAXDOP 1);

CREATE TABLE dbo.PRODUCTS (
    productID BIGINT NOT NULL,
    FILLER VARCHAR(2000) NOT NULL,
    PRIMARY KEY (productID)
);

INSERT INTO dbo.PRODUCTS WITH (TABLOCK)
SELECT t.RN, REPLICATE('Z', 2000)
FROM
(
    SELECT TOP (3000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) t (RN)
OPTION (MAXDOP 1);

Getting the first thousand rows is fast. It only requires 15 milliseconds of cpu time on my machine:

SELECT *
FROM dbo.ASSORTMENT a
INNER JOIN dbo.PRODUCTS p on a.clientID = 38 and a.productID = p.productID
ORDER BY a.ID_ROW OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY
OPTION (MAXDOP 1);

If I change the OFFSET to 50000 rows the query plan looks the same but the query now takes 1700 ms of CPU time on my machine:

Why is the query so much slower? ORDER BY with an OFFSET requires SQL Server to scan past the unwanted data. For my data set SQL Server SQL server reads 764993 rows from ASSORTMENT before finding the 51k rows that it needs. It also does 51k index seeks to PRODUCTS before returning the 1000 rows that you requested.

Let's take a step back. Do you really need to select all columns from both tables? Your question said that the join to PRODUCTS is only there for data integrity, so why do you need all columns from that table? Unnecessary SELECT * is a bad habit which is best avoided. It would make a great New Year's resolution!

One standard solution would be to only select the columns that you need and to make covering indexes on the two tables. I'm going to assume that you absolutely need all columns from both tables. In that case you want the query optimizer to do the scan over as little data as possible, but in terms of number of rows and number of columns. To make that possible I'll create the following indexes:

CREATE INDEX IX_ASSORTMENT ON dbo.ASSORTMENT (clientID, ID_ROW) INCLUDE (productID);

CREATE INDEX IX_PRODUCTS ON dbo.PRODUCTS (productID);

A query to get just the clustered indexes of both tables now just takes 63 ms of CPU time on my machine:

SELECT a.ID_ROW, p.productID
FROM dbo.ASSORTMENT a
INNER JOIN dbo.PRODUCTS p on a.clientID = 38 and a.productID = p.productID
ORDER BY a.ID_ROW OFFSET 50000 ROWS FETCH NEXT 1000 ROWS ONLY
OPTION (MAXDOP 1);

The big differences are that SQL Server doesn't have to read rows for which ProductId is not 38 and it doesn't need to read data pages that contain unnecessary columns. Adding all of the columns back into the query bumps up the CPU time requirement to 80 ms but that's still better than a 20X improvement than before:

SELECT a.*, p.*
FROM
(
    SELECT a.ID_ROW, p.productID
    FROM dbo.ASSORTMENT a
    INNER JOIN dbo.PRODUCTS p on a.clientID = 38 and a.productID = p.productID
    ORDER BY a.ID_ROW OFFSET 50000 ROWS FETCH NEXT 1000 ROWS ONLY
) t
LEFT OUTER JOIN dbo.ASSORTMENT a ON t.ID_ROW = a.ID_ROW
LEFT OUTER JOIN dbo.PRODUCTS p ON t.productID = p.productID 
OPTION (MAXDOP 1);

If you need the query to be faster than that you can consider using an anchored solution as described here.

Best Answer

Related Solutions

MySQL – How to Speed Up ORDER BY and LIMIT

Sql-server – MSSQL efficient paging including a join on big tables

Related Question