Sql-server – Poor SQL Server Performance With Nested “OR” Conditions In LEFT OUTER JOIN

database-designoptimizationperformancequery-performancesql server

I have a tricky bit of SQL that's giving me a bit of trouble. I'd like to introduce several new columns to a key table whose components may or may not be used based on user configuration (Dimension's 1-10). For architectural reasons these are stored as "UNIQUEIDENTIFIER"s.

However upon the introduction of these new dimension columns in my DB, I'm now facing performance problems in several other stored procs that manipulate data in this key table, one of which I will include below. It appears as though the method that I have implemented to support these new columns is inefficient and I'm not exactly sure why that is the case and the best way to remedy it.

(Included SQL)

--In real usage this would be set by user configuration
DECLARE @Dimension1Enabled BIT = 1
DECLARE @Dimension2Enabled BIT = 1
DECLARE @Dimension3Enabled BIT = 1
DECLARE @Dimension4Enabled BIT = 1
DECLARE @Dimension5Enabled BIT = 1
DECLARE @Dimension6Enabled BIT = 1
DECLARE @Dimension7Enabled BIT = 1
DECLARE @Dimension8Enabled BIT = 1
DECLARE @Dimension9Enabled BIT = 1
DECLARE @Dimension10Enabled BIT = 1

/*
--Here's the table definition for reference
CREATE TABLE [dbo].[Table1](
    [Column1] [int] NOT NULL,
    [Column2] [uniqueidentifier] NOT NULL,
    [Column3] [uniqueidentifier] NOT NULL,
    [Column4] [char](3) NOT NULL,
    [Dimension1] [uniqueidentifier] NULL,
    [Dimension2] [uniqueidentifier] NULL,
    [Dimension3] [uniqueidentifier] NULL,
    [Dimension4] [uniqueidentifier] NULL,
    [Dimension5] [uniqueidentifier] NULL,
    [Dimension6] [uniqueidentifier] NULL,
    [Dimension7] [uniqueidentifier] NULL,
    [Dimension8] [uniqueidentifier] NULL,
    [Dimension9] [uniqueidentifier] NULL,
    [Dimension10] [uniqueidentifier] NULL,
    [Period] [int] NOT NULL,
    [Amt] [numeric](28, 12) NOT NULL,
    [EndingBal] [numeric](28, 12) NOT NULL,
    [PlanningSourceID] [int] NOT NULL,
    [Column5] [uniqueidentifier] NULL
) ON [PRIMARY]

--Simply contains 0-12
CREATE TABLE [dbo].[Table2] (
    [FiscalPeriod] [int] NOT NULL
    ,[FiscalPeriodDescription] [varchar](50) NOT NULL
    ,[FiscalPeriodDescriptionMonthName] [varchar](20) NULL
    ,[FiscalPeriodName] [varchar](255) NULL
)
*/

SELECT pfd.*
FROM (
    SELECT PFDE1.Column1
        ,PFDE1.Column2
        ,PFDE1.Column3
        ,PFDE1.Column4
        ,PFDE1.Dimension1
        ,PFDE1.Dimension2
        ,PFDE1.[Dimension3]
        ,PFDE1.[Dimension4]
        ,PFDE1.[Dimension5]
        ,PFDE1.[Dimension6]
        ,PFDE1.[Dimension7]
        ,PFDE1.[Dimension8]
        ,PFDE1.[Dimension9]
        ,PFDE1.[Dimension10]
        ,p.FiscalPeriod AS Period
        ,0 AS Amt
        ,0 AS EndingBal
        ,PFDE1.Column5
    FROM (
        SELECT DISTINCT PFDE1.Column1
            ,PFDE1.Column2
            ,PFDE1.Column3
            ,PFDE1.Column4
            ,PFDE1.Dimension1
            ,PFDE1.Dimension2
            ,PFDE1.[Dimension3]
            ,PFDE1.[Dimension4]
            ,PFDE1.[Dimension5]
            ,PFDE1.[Dimension6]
            ,PFDE1.[Dimension7]
            ,PFDE1.[Dimension8]
            ,PFDE1.[Dimension9]
            ,PFDE1.[Dimension10]
            ,PFDE1.Column5
        FROM [dbo].[Table1] PFDE1
        ) PFDE1
    CROSS JOIN [dbo].[Table2] p
    ) pfd
LEFT JOIN [dbo].[Table1] pfde ON PFDE.Column1 = pfd.Column1
    AND PFDE.Column2 = pfd.Column2
    AND PFDE.Column3 = pfd.Column3
    AND PFDE.Column4 = pfd.Column4
    --This section causes great slowness
    AND (@Dimension1Enabled = 0 OR PFDE.Dimension1 = pfd.Dimension1)
    AND (@Dimension2Enabled = 0 OR PFDE.Dimension2 = pfd.Dimension2)
    AND (@Dimension3Enabled = 0 OR PFDE.Dimension3 = pfd.Dimension3)
    AND (@Dimension4Enabled = 0 OR PFDE.Dimension4 = pfd.Dimension4)
    AND (@Dimension5Enabled = 0 OR PFDE.Dimension5 = pfd.Dimension5)
    AND (@Dimension6Enabled = 0 OR PFDE.Dimension6 = pfd.Dimension6)
    AND (@Dimension7Enabled = 0 OR PFDE.Dimension7 = pfd.Dimension7)
    AND (@Dimension8Enabled = 0 OR PFDE.Dimension8 = pfd.Dimension8)
    AND (@Dimension9Enabled = 0 OR PFDE.Dimension9 = pfd.Dimension9)
    AND (@Dimension10Enabled = 0 OR PFDE.Dimension10 = pfd.Dimension10) 
    AND PFDE.Period = pfd.Period
    AND PFDE.Column5 = pfd.Column5
WHERE pfde.Column1 IS NULL

With 50,000 records in Table1 the performance of this query is quite poor, several minutes at least. Remove the "@DimensionXEnabled = 0 OR" condition though and the query will run in several seconds.

I feel I have several options to optimize this, I could do one of the following:

  1. Change the "Dimension" columns to NOT NULL and populate with 0 Guid Values – I believe this would have a performance impact because now I'm coping a large amount of data with each insert. I've tested this option and it does work better but there is a definite performance impact and I'd like to avoid it if possible.

  2. Convert the query to dynamic SQL and generate the join conditions dynamically – Would need to do this in 100+ other places and have a DB full of dynamic SQL, not looking forward to this option either.

  3. Wrap the "Dimension1 = Dimension1" in ISNULL statements rather than based the join condition off of a variable – I believe this would result in a inefficient join statement as the query would no longer be able to use indexes.

I feel that the SQL optimizer is making bad decisions with this query, if I write the query without the OR condition it run's quickly, even though the results are exactly the same. The query plans look almost identical other than that in the long running join, most of the processing time is used in the LEFT OUTER JOIN statements merge:

(With "OR" Conditions)

enter image description here

(Without "OR" Conditions)

enter image description here

I'm unsure how to proceed, my question is, is there a better option?

Update 1

I've been doing more research into this, attempting various ways to get around the poor performance, so far I haven't found any good solutions just more information:

  1. Tried column "IS NULL" without success (PFDE.Dimension1 IS NULL OR PFDE.Dimension1 = pfd.Dimension1), ran even slower, that doesn't seem to do it.

  2. Tired join "Hints" (HASH, MERGE, LOOP), no performance impact here, in fact in some cases they just ran slower.

  3. Tried option 3, actually this seems to run very quick, I think because Table2 doesn't have any indexes, this approach could be a problem in the future though…

  4. I've tried dynamic SQL (@AaronBertrand recommendation), actually the results were somewhat underwhelming (30 sec+), not sure why it performed so poorly, but that might be something that I could improve with some further optimizations, I'm looking into this more..

  5. I tried the "OPTION (RECOMPILE);" mentioned in this post. Saw definite improvements (~15 secs), still not as fast as not dimensions but maybe something else I can iterate on.

Best Answer

Sounds like a kitchen sink query. You can make this work well for most cases with dynamic SQL, since you can generate a different execution plan for each combination of parameters.

DECLARE @sql nvarchar(max) = N'
    SELECT DISTINCT PFDE1.Column1
        ,PFDE1.Column2
        ,PFDE1.Column3
        ,PFDE1.Column4
        ,PFDE1.Dimension1
        ,PFDE1.Dimension2
        ...
        ,PFDE1.[Dimension10]
        ,PFDE1.Column5
    FROM [dbo].[Table1] PFDE1
    ) PFDE1
CROSS JOIN [dbo].[Table2] p
) pfd
LEFT JOIN [dbo].[Table1] pfde 
ON PFDE.Column1 = pfd.Column1
AND PFDE.Column2 = pfd.Column2
AND PFDE.Column3 = pfd.Column3
AND PFDE.Column4 = pfd.Column4'

+ CASE WHEN @Dimension1Enabled = 1 THEN 
  N' AND PFDE.Dimension1 = pfd.Dimension1' ELSE N'' END
+ CASE WHEN @Dimension2Enabled = 1 THEN 
  N' AND PFDE.Dimension2 = pfd.Dimension2' ELSE N'' END

+ ...

+ CASE WHEN @Dimension10Enabled = 1 THEN 
  N' AND PFDE.Dimension10 = pfd.Dimension10' ELSE '' END

+ N'    AND PFDE.Period = pfd.Period
  AND PFDE.Column5 = pfd.Column5
  WHERE pfde.Column1 IS NULL;';

EXEC sys.sp_executesql @sql;

I'd also drop the DISTINCT - if it's really necessary because of duplicates then there is an underlying problem to be fixed as well.