This composite index may help:
INDEX(account_id, lastname)
When adding it, you may as well remove INDEX(account_id)
, since it will then be redundant.
This may be even better, but I am not sure:
INDEX(deleted, account_id, lastname)
However it does not supersede INDEX(account_id)
.
You didn't describe the problem very precisely so I made some guesses about table structures and data volume. I put 3 million rows in both tables:
CREATE TABLE dbo.ASSORTMENT (
ID_ROW BIGINT NOT NULL,
clientID INT NOT NULL,
productID BIGINT NOT NULL,
FILLER VARCHAR(2000) NOT NULL,
PRIMARY KEY (ID_ROW)
);
INSERT INTO dbo.ASSORTMENT WITH (TABLOCK)
SELECT t.RN, 30 + t.RN % 15, t.RN, REPLICATE('Z', 2000)
FROM
(
SELECT TOP (3000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) t (RN)
OPTION (MAXDOP 1);
CREATE TABLE dbo.PRODUCTS (
productID BIGINT NOT NULL,
FILLER VARCHAR(2000) NOT NULL,
PRIMARY KEY (productID)
);
INSERT INTO dbo.PRODUCTS WITH (TABLOCK)
SELECT t.RN, REPLICATE('Z', 2000)
FROM
(
SELECT TOP (3000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) t (RN)
OPTION (MAXDOP 1);
Getting the first thousand rows is fast. It only requires 15 milliseconds of cpu time on my machine:
SELECT *
FROM dbo.ASSORTMENT a
INNER JOIN dbo.PRODUCTS p on a.clientID = 38 and a.productID = p.productID
ORDER BY a.ID_ROW OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY
OPTION (MAXDOP 1);
If I change the OFFSET
to 50000 rows the query plan looks the same but the query now takes 1700 ms of CPU time on my machine:
Why is the query so much slower? ORDER BY
with an OFFSET
requires SQL Server to scan past the unwanted data. For my data set SQL Server SQL server reads 764993 rows from ASSORTMENT
before finding the 51k rows that it needs. It also does 51k index seeks to PRODUCTS
before returning the 1000 rows that you requested.
Let's take a step back. Do you really need to select all columns from both tables? Your question said that the join to PRODUCTS
is only there for data integrity, so why do you need all columns from that table? Unnecessary SELECT *
is a bad habit which is best avoided. It would make a great New Year's resolution!
One standard solution would be to only select the columns that you need and to make covering indexes on the two tables. I'm going to assume that you absolutely need all columns from both tables. In that case you want the query optimizer to do the scan over as little data as possible, but in terms of number of rows and number of columns. To make that possible I'll create the following indexes:
CREATE INDEX IX_ASSORTMENT ON dbo.ASSORTMENT (clientID, ID_ROW) INCLUDE (productID);
CREATE INDEX IX_PRODUCTS ON dbo.PRODUCTS (productID);
A query to get just the clustered indexes of both tables now just takes 63 ms of CPU time on my machine:
SELECT a.ID_ROW, p.productID
FROM dbo.ASSORTMENT a
INNER JOIN dbo.PRODUCTS p on a.clientID = 38 and a.productID = p.productID
ORDER BY a.ID_ROW OFFSET 50000 ROWS FETCH NEXT 1000 ROWS ONLY
OPTION (MAXDOP 1);
The big differences are that SQL Server doesn't have to read rows for which ProductId
is not 38 and it doesn't need to read data pages that contain unnecessary columns. Adding all of the columns back into the query bumps up the CPU time requirement to 80 ms but that's still better than a 20X improvement than before:
SELECT a.*, p.*
FROM
(
SELECT a.ID_ROW, p.productID
FROM dbo.ASSORTMENT a
INNER JOIN dbo.PRODUCTS p on a.clientID = 38 and a.productID = p.productID
ORDER BY a.ID_ROW OFFSET 50000 ROWS FETCH NEXT 1000 ROWS ONLY
) t
LEFT OUTER JOIN dbo.ASSORTMENT a ON t.ID_ROW = a.ID_ROW
LEFT OUTER JOIN dbo.PRODUCTS p ON t.productID = p.productID
OPTION (MAXDOP 1);
If you need the query to be faster than that you can consider using an anchored solution as described here.
Best Answer
It's hard to say without seeing query plans, the table structure, or the index definitions on the table. You should make sure that the index is covering the query. Other than that, you can encourage SQL Server to use the index even without an explicit hint to use the index. For example, if you are confident that the query should always use the index you could try setting a small rowgoal with a hint:
If you want to avoid hints entirely I recommend rewriting the query. Using
OFFSET
with large values requires special care. It's easy to end up with a plan that scans lots of rows and does a lot of unnecessary work just to get the next 50.If your application pages through the data one page at a time you can consider the technique described here. The basic idea is that the application keeps track of the last value that the end user has seen. When you need a new page you can pass down the last value as a filter:
If you need to be able to show an arbitrary 50 pages you can use the approach detailed here by Aaron Bertrand. With CTEs you can encourage the query optimizer to scan through a narrow index until it reaches the first row that you want. The query will get slower as you increase the
OFFSET
but it could perform much better than what you're doing now. The basic idea would be like this:Finally, Paul White also writes about another way to efficient page through data here that does not use
OFFSET
at all.With the right indexes I expect that at least one of the techniques described in this answer will work well for you.