We have a table with products and a table with a current selection/assortment of products. Now need to join them create an export file by some .net code.
Currently using query like this:
SELECT * FROM ASSORTMENT a
INNER JOIN PRODUCTS p on a.clientID = 38 and a.productID = p.productID
ORDER BY a.ID_ROW OFFSET 100000 ROWS FETCH NEXT 1000 ROWS ONLY
Problem is that the assortment can contain discontinued products, that are not existing in products
table anymore, so need a join to verify. Also need each time exactly same amount of rows per select, e.g. 1000.
The query gets slower if the offset gets higher because it does a full join of products and assortment table each time. On 3 Million products and an assortment of 200.000 articles it takes multiple seconds to fetch next 1000 if the offset is over 50.0000
Is there a more efficient way of doing this?
Best Answer
You didn't describe the problem very precisely so I made some guesses about table structures and data volume. I put 3 million rows in both tables:
Getting the first thousand rows is fast. It only requires 15 milliseconds of cpu time on my machine:
If I change the
OFFSET
to 50000 rows the query plan looks the same but the query now takes 1700 ms of CPU time on my machine:Why is the query so much slower?
ORDER BY
with anOFFSET
requires SQL Server to scan past the unwanted data. For my data set SQL Server SQL server reads 764993 rows fromASSORTMENT
before finding the 51k rows that it needs. It also does 51k index seeks toPRODUCTS
before returning the 1000 rows that you requested.Let's take a step back. Do you really need to select all columns from both tables? Your question said that the join to
PRODUCTS
is only there for data integrity, so why do you need all columns from that table? UnnecessarySELECT *
is a bad habit which is best avoided. It would make a great New Year's resolution!One standard solution would be to only select the columns that you need and to make covering indexes on the two tables. I'm going to assume that you absolutely need all columns from both tables. In that case you want the query optimizer to do the scan over as little data as possible, but in terms of number of rows and number of columns. To make that possible I'll create the following indexes:
A query to get just the clustered indexes of both tables now just takes 63 ms of CPU time on my machine:
The big differences are that SQL Server doesn't have to read rows for which
ProductId
is not 38 and it doesn't need to read data pages that contain unnecessary columns. Adding all of the columns back into the query bumps up the CPU time requirement to 80 ms but that's still better than a 20X improvement than before:If you need the query to be faster than that you can consider using an anchored solution as described here.