I have spent the last couple days searching, watching videos, and I think I've gotten as far as I can get just fumbling my way through. I am looking for more specific direction given my example below.
I have two tables I'm working with. MessageThreads (400k records) & Messages (1M records). Their schemas are shown below.
MessageThreads indexes
https://gist.github.com/timgabrhel/0a9ff88160ebc9e40559e1e10ecc7ee4
Messages indexes
https://gist.github.com/timgabrhel/d649074cbe82016e8a90f918c58c4764
I am trying to improve the performance of our primary "inbox" query. Think of your email provider's inbox. You'll see a list of threads, some new, some read, sorted by date, and also gives you a preview of the most recently sent message, whether it was to or from you. Finally, there is an element of paging on this query. By default, we want 11 items. 10 for the page to display, and +1 to know if there's more on the next page.
For some of our long time users, they can have up to 40K messages.
This query has seen many different forms over the last few days, but this is where I've gotten to. I've given OUTER APPLY
a try, but I'm seeing worse execution time & statistics.
SET STATISTICS IO ON; /* And turn on the Actual Excecution Plan */
declare @UserId bigint
set @UserId = 9999
; WITH cte AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SendDate DESC) AS RowNum,
MT.MessageThreadId,
MT.FromUserHasArchived,
MT.ToUserHasArchived,
MT.Created,
MT.ThreadStartedBy,
MT.ThreadSentTo,
MT.[Subject],
MT.CanReply,
MT.FromUserDeleted,
MT.ToUserDeleted,
LM.MessageId,
LM.Deleted,
LM.FromUserId,
LM.ToUserId,
LM.[Message],
LM.SendDate,
LM.ReadDate
FROM MessageThreads MT
-- join the most recent non-deleted message where this user is the sender or receiver
LEFT OUTER JOIN
(
SELECT RANK() OVER (PARTITION BY MessageThreadId ORDER BY SendDate DESC) r, *
FROM [Messages]
WHERE (FromUserId=@UserId OR ToUserId=@UserId)
AND (Deleted=0)
) LM ON (LM.MessageThreadId = MT.MessageThreadId AND LM.r = 1)
--WHERE MT.ThreadSentTo=@UserId OR MT.ThreadStartedBy=@UserId
)
SELECT
cte.*,
UserFrom.FirstName AS UserFromFirstName,
UserFrom.LastName AS UserFromLastName,
UserFrom.Email AS UserFromEmail,
UserTo.FirstName AS UserToFirstName,
UserTo.LastName AS UserToLastName,
UserTo.Email AS UserToEmail
FROM cte
LEFT OUTER JOIN Users AS UserFrom ON cte.FromUserId=UserFrom.UserId
LEFT OUTER JOIN Users AS UserTo ON cte.ToUserId=UserTo.UserId
WHERE RowNum >= 1
AND RowNum <= 11
ORDER BY RowNum ASC
Statistics for above query (execution time ~2 seconds in SSMS). This execution time is acceptable, but the statistics feel less than desirable, and even more so upon reviewing the actual execution plan.
The execution plan is linked here
https://gist.github.com/timgabrhel/f8d919d5728e965623fbd953f7a219ef
One huge hiccup I've spotted is the 400k row Index Scan on the MessageThreads table. Presumably this is because the primary SELECT X FROM MessageThreads
query doesn't have a filter on it. When I apply a predicate to it (uncomment the WHERE from the query), the statistics greatly improve (below), but the time jumps from ~2 seconds to ~18 seconds in SSMS.
The problem area on the query is the MessageThreads predicate
https://gist.github.com/timgabrhel/1383ff9362567fdf41ba011dead63ceb
Thank you in advance!
Best Answer
A few thoughts:
WHERE MT.ThreadSentTo=@UserId OR MT.ThreadStartedBy=@UserId
really needs two indexes to be efficient - one on the ThreadSentTo field, and one on the ThreadStartedBy field. Otherwise, the SQL engine will be performing a full table scan to retrieve the correct threads.Starting in SQL 2012, a new construct was added to SQL Server for handling paging. This works like this:
In this case, the query will skip the first 285 ((20-1)*15) rows, and retrieve the next 15 rows. This is a faster paging method than the older RowNumber() filter for normal paging.