Sql-server – Best way to write SQL Query that checks a column for non-NULL value or NULL

performancesql-server-2008

I have a SP with a parameter that has NULL as default value and then I want to do a query like this:

SELECT ...
FROM ...
WHERE a.Blah = @Blah AND (a.VersionId = @VersionId OR (@VersionId IS NULL AND a.VersionId IS NULL));

The WHERE above checks for both a non-NULL value and a NULL value for @VersionId.

Would it be better in terms of performance to instead use an IF statement and duplicate the query into one that searches for non-NULL and another for NULL like so? :

IF @VersionId IS NULL BEGIN
    SELECT ...
    FROM ...
    WHERE a.Blah = @Blah AND a.VersionId IS NULL;
ELSE BEGIN
    SELECT ...
    FROM ...
    WHERE a.Blah = @Blah AND a.VersionId = @VersionId;
END

Or the query optimizer makes it essentially the same?

UPDATE:

(Note: I am using SQL Server)

(And as far as I know, using a.VersionId = @VersionId for both cases will not work, will it?)

Best Answer

This pattern

column = @argument OR (@argument IS NULL AND column IS NULL)

can be replaced with

EXISTS (SELECT column INTERSECT SELECT @argument)

This will let you match a NULL with a NULL and will allow the engine to use an index on column efficiently. For an excellent in-depth analysis of this technique, I refer you to Paul White's blog article:

As there are two arguments in your particular case, you can use the same matching technique with @Blah – that way you will be able to rewrite the entire WHERE clause more or less concisely:

WHERE
  EXISTS (SELECT a.Blah, a.VersionId INTERSECT SELECT @Blah, @VersionId)

This will work fast with an index on (a.Blah, a.VersionId).


Or the query optimizer makes it essentially the same?

In this case, yes. In all versions (at least) from SQL Server 2005 onward, the optimizer can recognize the pattern col = @var OR (@var IS NULL AND col IS NULL) and replace it with the proper IS comparison. This does rely on internal rewrite matching, so there may be more complex cases where this is not always reliable.

In versions of SQL Server from 2008 SP1 CU5 inclusive, you also have the option of using the Parameter Embedding Optimization via OPTION (RECOMPILE), where the runtime value of any parameter or variable is embedded in the query as a literal before compilation.

So, at least to a large extent, in this case the choice is a matter of style, though the INTERSECT construction is undeniably compact and elegant.

The following examples show the 'same' execution plan for each variation (literals versus variable references excluded):

DECLARE @T AS table
(
    c1 integer NULL,
    c2 integer NULL,
    c3 integer NULL

    UNIQUE CLUSTERED (c1, c2)
);

-- Some data
INSERT @T
    (c1, c2, c3)
SELECT 1, 1, 1 UNION ALL
SELECT 2, 2, 2 UNION ALL
SELECT NULL, NULL, NULL UNION ALL
SELECT 3, 3, 3;

-- Filtering conditions
DECLARE 
    @c1 integer,
    @c2 integer;

SELECT
    @c1 = NULL,
    @c2 = NULL;

-- Writing the NULL-handling out explicitly
SELECT * 
FROM @T AS T
WHERE 
(
    T.c1 = @c1
    OR (@c1 IS NULL AND T.c1 IS NULL)
)
AND 
(
    T.c2 = @c2
    OR (@c2 IS NULL AND T.c2 IS NULL)
);

-- Using INTERSECT
SELECT * 
FROM @T AS T
WHERE EXISTS 
(
    SELECT T.c1, T.c2 
    INTERSECT 
    SELECT @c1, @c2
);

-- Using separate queries
IF @c1 IS NULL AND @c2 IS NULL
    SELECT * 
    FROM @T AS T
    WHERE T.c1 IS NULL
    AND T.c2 IS NULL
ELSE IF @c1 IS NULL
    SELECT * 
    FROM @T AS T
    WHERE T.c1 IS NULL
    AND T.c2 = @c2
ELSE IF @c2 IS NULL
    SELECT * 
    FROM @T AS T
    WHERE T.c1 = @c1
    AND T.c2 IS NULL
ELSE
    SELECT * 
    FROM @T AS T
    WHERE T.c1 = @c1
    AND T.c2 = @c2;

-- Using OPTION (RECOMPILE)
-- Requires 2008 SP1 CU5 or later
SELECT * 
FROM @T AS T
WHERE 
(
    T.c1 = @c1
    OR (@c1 IS NULL AND T.c1 IS NULL)
)
AND 
(
    T.c2 = @c2
    OR (@c2 IS NULL AND T.c2 IS NULL)
)
OPTION (RECOMPILE);