As a rule, a CTE will NEVER improve performance.
A CTE is essentially a disposable view. There are no additional statistics stored, no indexes, etc. It functions as a shorthand for a subquery.
In my opinion they can be EASILY overused (I see a lot of overuse in code in my job). Some good answers are here, but if you need to refer to something more than once, or it's more than a few hundred thousand rows, put it into a #temp
table instead and index it.
This was rather longshot but since the OP says it worked, I'm adding it as an answer (feel free to correct it if you find anything wrong).
Try to break the internal query into three parts (INNER JOIN
, LEFT JOIN
with WHERE IS NULL
check, RIGHT JOIN
with IS NULL
check) and then UNION ALL
the three parts. This has the following advantages:
The optimizer has less transformation options available for FULL
joins than for (the more common) INNER
and LEFT
joins.
The Z
derived table can be removed (you can do that anyway) from the view definition.
The NOT(pe.ThisThing = 1 AND se.OtherThing = 0)
will be needed only on the INNER
join part.
Minor improvement, the use COALESCE()
will be minimal if any at all (I assumed that se.SEId
and pe.PEId
are not nullable. If more columns are not nullable, you'll be able to remove more COALESCE()
calls.)
More important, the optimizer may push down any conditions in your queries that involve these columns (now that COALESCE()
is not blocking the push.)
All the above will give the optimizer more options to transform/rewrite any query that uses the view so it may find an execution plan that indexes on the underlying tables can be used.
In all, the view can be written as:
SELECT
se.SEId + '-' + pe.PEId AS Id,
se.SEId, pe.PEId,
pe.StaffName,
pe.EventTime,
COALESCE(pe.EventType, se.EventType) AS EventType,
pe.Duration,
COALESCE(pe.Data, se.Data) AS Data,
COALESCE(pe.Field, se.Field) AS Field,
pe.ThisThing, se.OtherThing,
DATEADD(minute, pe.Duration, pe.EventTime) AS EventEndTime
FROM PE pe INNER JOIN SE se
ON pe.StaffName = se.StaffName
AND pe.Duration = se.Duration
AND pe.EventTime = se.EventTime
WHERE NOT (pe.ThisThing = 1 AND se.OtherThing = 0)
UNION ALL
SELECT
'0-0',
NULL, pe.PEId,
pe.StaffName,
pe.EventTime,
pe.EventType,
pe.Duration,
pe.Data,
pe.Field,
pe.ThisThing, NULL,
DATEADD(minute, pe.Duration, pe.EventTime) AS EventEndTime
FROM PE pe LEFT JOIN SE se
ON pe.StaffName = se.StaffName
AND pe.Duration = se.Duration
AND pe.EventTime = se.EventTime
WHERE NOT (pe.ThisThing = 1)
AND se.StaffName IS NULL
UNION ALL
SELECT
'0-0',
se.SEId, NULL,
se.StaffName,
se.EventTime,
se.EventType,
se.Duration,
se.Data,
se.Field,
NULL, se.OtherThing,
DATEADD(minute, se.Duration, se.EventTime) AS EventEndTime
FROM PE pe RIGHT JOIN SE se
ON pe.StaffName = se.StaffName
AND pe.Duration = se.Duration
AND pe.EventTime = se.EventTime
WHERE NOT (se.OtherThing = 0)
AND pe.StaffName IS NULL ;
Best Answer
To expand a bit more on Jonathan's comment, each time you reference a CTE, the query within the CTE will be called.
Each
UNION
you are using is creating a separate execution of the CTE, thereby executing the table value function each time as well. By putting the results into a temp table, you are only then reading the static data in the temp table for eachUNION
and avoid recalculating the data.If you check the execution plan for each of the queries, you'll be able to see a visual confirmation of this fact.
In your case, it makes much more sense to use a temp table as illustrated by the vastly different execution times. I personally find non-recursive CTEs rarely offer much by way of performance and are usually more helpful for readability and logical separation of query contents.