Sql-server – Execution Plan is different when Constant Variable is Null

execution-planoptimizationperformancequery-performancesql serversql-server-2016

I am working on a scenario in which I am providing a variable a value. When I pass Null value, the Query Engine didn't scan the Join tables. As per Logical Query Processing, first FROM Clause Executes, then ON and JOIN executes. But in this Case the Query Engine direct go to Where Clause. Could anyone please explain the Behavior Query Engine when there is a NULL value of a Variable. I am using SQL Server 2016.

When the Value is changed then

Best Answer

You're seeing an example of contradiction detection. SQL Server's query optimizer is smart enough to perform a few quick checks at the outset of query processing to determine if it can take any shortcuts to return results without doing much (any?!) work. One such short circuit is to check if the where clause contains an "impossible" limit, such as:

WHERE 1=0

WHERE NULL = NULL

SQL Server sees you are comparing something to NULL, which always returns false, and simply returns an empty resultset. If the items in your WHERE clause were separated with OR instead of AND, then further checks would be performed and you might not see the constant-scan plan.

Comparing any value to NULL using = always returns false. You may want to use WHERE @c IS NULL instead. If you rewrite your query using that syntax, you'll see SQL Server does in fact "run" the query. Consider this:

IF OBJECT_ID(N'tempdb..#t', N'U') IS NOT NULL DROP TABLE #t;
CREATE TABLE #t 
(
    i int NOT NULL
);

DECLARE @c int = 1;

The query plan for this query:

SELECT *
FROM #t t
WHERE t.i = 1
    AND @c = NULL;

Versus the query plan for this query:

SELECT *
FROM #t t
WHERE t.i = 1
    AND @c IS NULL;

The problems around comparing NULL values applies to JOIN conditions as well. Consider this:

IF OBJECT_ID(N'tempdb..#t', N'U') IS NOT NULL DROP TABLE #t;
CREATE TABLE #t 
(
    i int NULL
);

IF OBJECT_ID(N'tempdb..#s', N'U') IS NOT NULL DROP TABLE #s;
CREATE TABLE #s 
(
    i int NULL
);

I'll insert a row into each table with the value of i set to NULL:

INSERT INTO #t (i) VALUES (NULL);
INSERT INTO #s (i) VALUES (NULL);

Now, if we try to JOIN the two tables in a "simple" query, we get no results returned, since NULL cannot be compared to anything, not even another NULL value!

SELECT *
FROM #t t
    INNER JOIN #s s ON t.i = s.i;

Both tables are scanned by the query processor, as shown in the query plan:

In other words, be careful using NULL values.

Related Solutions

Sql-server – How (and why) does TOP impact an execution plan

I would have guessed that when a query includes TOP n the database engine would run the query ignoring the the TOP clause, and then at the end just shrink that result set down to the n number of rows that was requested. The graphical execution plan seems to indicate this is the case -- TOP is the "last" step. But it appears there is more going on.

The way the above is phrased makes me think you may have an incorrect mental picture of how a query executes. An operator in a query plan is not a step (where the full result set of a previous step is evaluated by the next one.

SQL Server uses a pipelined execution model, where each operator exposes methods like Init(), GetRow(), and Close(). As the GetRow() name suggests, an operator produces one row at a time on demand (as required by its parent operator). This is documented in the Books Online Logical and Physical Operators reference, with more detail in my blog post Why Query Plans Run Backwards. This row-at-a-time model is essential in forming a sound intuition for query execution.

My question is, how (and why) does a TOP n clause impact the execution plan of a query?

Some logical operations like TOP, semi joins and the FAST n query hint affect the way the query optimizer costs execution plan alternatives. The basic idea is that one possible plan shape might return the first n rows more quickly than a different plan that was optimized to return all rows.

For example, indexed nested loops join is often the fastest way to return a small number of rows, though hash or merge join with scans might be more efficient on larger sets. The way the query optimizer reasons about these choices is by setting a Row Goal at a particular point in the logical tree of operations.

A row goal modifies the way query plan alternatives are costed. The essence of it is that the optimizer starts by costing each operator as if the full result set were required, sets a row goal at the appropriate point, and then works back down the plan tree estimating the number of rows it expects to need to examine to meet the row goal.

For example, a logical TOP(10) sets a row goal of 10 at a particular point in the logical query tree. The costs of operators leading up to the row goal are modified to estimate how many rows they need to produce to meet the row goal. This calculation can become complex, so it is easier to understand all this with a fully worked example and annotated execution plans. Row goals can affect more than the choice of join type or whether seeks and lookups are preferred to scans. More details on that here.

As always, an execution plan selected on the basis of a row goal is subject to the optimizer's reasoning abilities and the quality of information provided to it. Not every plan with a row goal will produce the required number of rows faster in practice, but according to the costing model it will.

Where a row goal plan proves not to be faster, there are usually ways to modify the query or provide better information to the optimizer such that the naturally selected plan is best. Which option is appropriate in your case depends on the details of course. The row goal feature is generally very effective (though there is a bug to watch out for when used in parallel execution plans).

Your particular query and plan may not be suitable for detailed analysis here (by all means provide an actual execution plan if you wish) but hopefully the ideas outlined here will allow you to make forward progress.

Sql-server – Execution Plan in Microsoft SQL Server

First of all: stored procedures in SQL Server are NOT "pre-compiled" or anything.

Just like an ad-hoc SQL query, a stored procedure is analyzed when it's first used, an execution plan is determined and cached.

When the stored procedure is executed again, then that pre-existing execution plan is reused. Same applies to a properly parametrized SQL ad-hoc query.

The point is: as long as the actual SQL statement is IDENTICAL (down to the last comma or space), then a possibly pre-existing, cached execution plan is reused and the whole step of determining an execution plan can be skipped. If the SQL statement text varies - even by a single space - the whole process of analyzing and coming up with an execution plan for that query is run again.

I don't know about how Java handles it's prepared statements, but in C# using ADO.NET, when you run a query something like

SELECT * FROM user_master WHERE user_name = @userName

and you supply a value for @userName (e.g. 'vicky.thakor'), then the SQL Server will NOT received what you seem to think it will receive - the parameter is NOT replaced in the query text - instead, SQL Server will execute this as

EXEC sp_executesql @stmt, N'@userName', @userName = 'vicki.thakor'

SQL Server will receive a parametrized query and a list of parameters and their values.

Therefore, if you call this same query again, with a different value for @userName, the actual SQL statement text is identical and the cached execution plan is reused - only the value of the parameter has changed.

Best Answer

Related Solutions

Sql-server – How (and why) does TOP impact an execution plan

Sql-server – Execution Plan in Microsoft SQL Server

Related Question