I'm trying to optimize a SQL query that shows variable performance (less than 2 to more than 15s).
Overview
- This is a ColdFusion website, with a SQL Server 2008 R2 backend
- The query is part of the website search engine – so used quite a bit.
This is a simplified view of the query (some JOINs and WHERE were left out)
with AllEligibleEntries AS (
SELECT DISTINCT b.*,
c.*,
DENSE_RANK() OVER (ORDER BY b.CreationDate DESC) AS RowNum
From Ticket b
LEFT JOIN Customer c ON b.CustomerID = c.CustomerID -- CustomerID is a string, and NOT the primary key
Where b.TicketID LIKE @P1 -- TicketID is a string, and NOT the primary key
-- various other conditions here, some always present, other added as required by coldfusion through <cfif ...> AND ... </cfif>
),
NumberOfEligibleEntries AS (
select MAX(AllEligibleEntries.RowNum) AS NbRows
from AllEligibleEntries
),
SelectedEntries AS (
Select AllEligibleEntries.*
From AllEligibleEntries
WHERE AllEligibleEntries.RowNum BETWEEN @P2 AND @P3 -- used to set the number of results per page and the page to display
-- in all testcases, @P2 is 1 and @P3 is 50, but up to 1000 in production
),
MostRecentAction AS ( -- "Greatest-N-Per-Group" - [0]
Select b1.Ticket, b1.StateID
FROM TicketEvents b1
LEFT OUTER JOIN TicketEvents b2
ON b1.Ticket = b2.Ticket
AND b1.TicketEventID < b2.TicketEventID
WHERE b2.Ticket IS NULL
AND b1.Ticket IN (Select Ticket from SelectedEntries)
),
CompletedTickets AS (
Select TicketEventID, Ticket, TimeStamp
From TicketEvents
Where StateID = 5 -- this is the "completed" state
AND Ticket IN (Select Ticket from SelectedEntries)
),
CompletionDate AS (
Select b1.Ticket, b1.TimeStamp
FROM CompletedTickets b1
LEFT OUTER JOIN CompletedTickets b2
ON b1.Ticket = b2.Ticket
AND b1.TicketEventID < b2.TicketEventID
WHERE b2.Ticket IS NULL
)
Select SelectedEntries.*, NumberOfEligibleEntries.NbRows,
d.TimeStamp AS CompletionDate
From SelectedEntries
left join NumberOfEligibleEntries on 1 = 1 -- ugly hack, so the total number of hits can be displayed in the search result page
left join MostRecentAction a on SelectedEntries.Ticket = a.Ticket
left join CompletionDate d on SelectedEntries.Ticket = d.Ticket
order by SelectedEntries.RowNum
I'm aware of two potential performance issues with the way this query is written:
- Trashed execution plan
Using the same approach than ColdFusion (i.e. sp_prepexec
, then sp_execute
), I sp_prepexec'ed the query with a value for @P1
that would cause AllEligibleEntries
to contain no results, then sp_execute
'd it with another @P1
chosen so AllEligibleEntries
would contain 49 rows. The execution plan of the second run will be posted to sqlperformance.com
as soon as I find why I get an error trying to upload it, but is thouroughly wrong (>2000 executions of index seeks on > 6000 rows when two runs on two rows were expected, and many others).
I've confirmed that using option(recompile)
(hinted by [1]) fixes this, but that seems much more like a patch than a fix.
Question: what are the alternatives to option(recompile) in a case like this?
- Multiple execution of queries
While investigating this issue, I read [2] that CTE are only a nice way to write queries, but in the end, they're substituted when used, and therefore can be executed multiple times. If this is the case (I haven't confirmed it yet), the original AllEligibleEntries
CTE would have to be executed something like 4 times.
Question: if this is the case, are there options to ensure that the CTE are only executed once? (I guess that the answer is temporary tables, but this too feels more like a patch than a fix.)
Question: are there other major performance issues I didn't spot?
Thanks
[0] https://stackoverflow.com/questions/tagged/greatest-n-per-group
Best Answer
For now, the issue has been addressed by:
Performance is measurably better (a few 100ms) than the option(recompile) approach, and the (now multiple) execution plans are cumulatively much simpler.
I'm still interested in feedback on this issue, although I may not revisit the code soon.
Thanks.