I have a large view that I use from within an application. I think I've narrowed down my performance problem, but I'm unsure how to fix it. A simplified version of the view looks like this:
SELECT ISNULL(SEId + '-' + PEId, '0-0') AS Id,
*,
DATEADD(minute, Duration, EventTime) AS EventEndTime
FROM (
SELECT se.SEId, pe.PEId,
COALESCE(pe.StaffName, se.StaffName) AS StaffName, -- << Problem!
COALESCE(pe.EventTime, se.EventTime) AS EventTime,
COALESCE(pe.EventType, se.EventType) AS EventType,
COALESCE(pe.Duration, se.Duration) AS Duration,
COALESCE(pe.Data, se.Data) AS Data,
COALESCE(pe.Field, se.Field) AS Field,
pe.ThisThing, se.OtherThing
FROM PE pe FULL OUTER JOIN SE se
ON pe.StaffName = se.StaffName
AND pe.Duration = se.Duration
AND pe.EventTime = se.EventTime
WHERE NOT(pe.ThisThing = 1 AND se.OtherThing = 0)
) Z
That probably doesn't justify the whole reason for the query structure, but maybe gives you an idea–this view joins two very poorly designed tables that I don't have control over and tries to synthesize some information out of it.
So, since this is a view used from the application, while trying to optimize I wrap it in another SELECT, like this:
SELECT * FROM (
-- … above code …
) Q
WHERE StaffName = 'SMITH, JOHN Q'
because the application is searching for specific staff members in the result.
The problem seems to be the COALESCE(pe.StaffName, se.StaffName) AS StaffName
section, and that I'm selecting from the view on StaffName
. If I change that to pe.StaffName AS StaffName
or se.StaffName AS StaffName
, the performance problems disappear (but see updated 2 below). But that won't do because one side or the other of the FULL OUTER JOIN
could be missing, so one or the other field may be NULL.
Can I refactor this replacing the COALESCE(…)
with something else, which will get rewritten down into the subquery?
Other notes:
- I've already added some indexes to fix performance problems with the rest of the query–without the
COALESCE
it is very fast. - Somewhat to my surprise, looking at the execution plan does not raise any flags, even when the wrapping subquery and
WHERE
statement is included. My total subquery cost in the analyzer is0.0065736
. Hmph. It takes four seconds to execute. - Changing the application to query differently
(e.g. returningmight work, but as a last resort–I'm really hoping I can optimize the view without having to resort to touching the application.pe.StaffName AS PEStaffName, se.StaffName AS SEStaffName
and doingWHERE PEStaffName = 'X' OR SEStaffName = 'X'
) - A stored procedure would probably make more sense for this, but the application is built with Entity Framework, and I could not figure out how to get it to play nice with a SP that returns a table type (another topic entirely).
Indexes
The indexes I've added so far look something like this:
CREATE NONCLUSTERED INDEX [IX_PE_EventTime]
ON [dbo].[PE] ([EventTime])
INCLUDE ([StaffName],[Duration],[EventType],[Data],[Field],[ThisThing])
CREATE NONCLUSTERED INDEX [IX_SE_EventTime]
ON [dbo].[SE] ([EventTime])
INCLUDE ([StaffName],[Duration],[EventType],[Data],[Field],[OtherThing])
Update
Hmm…I tried simulating the stricken change above, and it didn't help. I.e, before ) Z
above, I added AND (pe.StaffName = 'SMITH, JOHN Q' OR se.StaffName = 'SMITH, JOHN Q')
, but the performance is the same. Now I really don't know where to start.
Update 2
@ypercube 's comment on needing the full join made me realize that my synthesized query left out a probably important component. While, yes, I need the full join, the test I did above by dropping the COALESCE
and testing just one side of the join for a non-null value would make the other side of the full join irrelevant, and the optimizer was probably using this fact to speed up the query. Also, I've updated the example to show that StaffName
is actually one of the join keys–which probably has significant bearing on the question. I'm also now leaning toward his suggestion that breaking this into a three-way union instead of full join may be the answer, and will simplify the abundance of COALESCE
s I'm doing anyway. Trying it now.
Best Answer
This was rather longshot but since the OP says it worked, I'm adding it as an answer (feel free to correct it if you find anything wrong).
Try to break the internal query into three parts (
INNER JOIN
,LEFT JOIN
withWHERE IS NULL
check,RIGHT JOIN
withIS NULL
check) and thenUNION ALL
the three parts. This has the following advantages:The optimizer has less transformation options available for
FULL
joins than for (the more common)INNER
andLEFT
joins.The
Z
derived table can be removed (you can do that anyway) from the view definition.The
NOT(pe.ThisThing = 1 AND se.OtherThing = 0)
will be needed only on theINNER
join part.Minor improvement, the use
COALESCE()
will be minimal if any at all (I assumed thatse.SEId
andpe.PEId
are not nullable. If more columns are not nullable, you'll be able to remove moreCOALESCE()
calls.)More important, the optimizer may push down any conditions in your queries that involve these columns (now that
COALESCE()
is not blocking the push.)All the above will give the optimizer more options to transform/rewrite any query that uses the view so it may find an execution plan that indexes on the underlying tables can be used.
In all, the view can be written as: