Sql-server – Performance of a=0 and b=0 and … z=0 vs a+b+c+d=0

performancequery-performancesql server

This is a simple question that I can't seem to find the answer for.

In terms of performance, If I have a WHERE clause such as a=0 and b=0 and ... z=0, Would I gain any performance if I replaced that condition with a+b+...+z=0?

In other words, is there any performance gain by replacing the following

Select * 
From MyTable 
Where A=0 and B=0 and C=0 and D=0...

With

Select * 
From MyTable 
Where A+B+C+D=0...

I know it can depend on indexes, but for this purpose, let's just say no indexes exist. Does the arithmetic operator (+) perform better than an "OR" or "AND" Logical Operator?

I'm under the impression that the addition performs better than multiple conditions with ANDs or ORs.

Test results

On a Table of 4.2 million rows

Returning rows Where A=0 B=0 and C=0 -> 351748 Rows

The addition (A+B+C=0) took 5 seconds while the Logical conditions A=0 and B=0 and C=0 took 11 seconds.

On the other hand

Returning rows Where A<>0 B<>0 or C<>0 -> 3829750 Rows 58 seconds

Returning rows Where F65+ F67+f64<>0 -> 3829750 Rows 57 seconds

For the OR, it seems that there is no significant difference.

I agree with gbn:

If A is -1 and B is 1, A+B=0 but A=0 and B= 0 is false

and with AMtwo:

ABS(A)+ABS(B)+ABS(C)+ABS(D)… Even if you expect only positive values, if the column accepts negative values, you should assume that you might encounter one

The results are very impressive, as I thought, It seems that the addition is much quicker than the logical operators.

A = Float, B = Money and C = Float. The query used is as shown. In my case, all are positive numbers. No indexes. It is just logical in my mind that addition would be quicker than Logical conditions!

Best Answer

In your question, you detail some tests that you've prepared where you "prove" that the addition option is quicker than comparing the discrete columns. I suspect your test methodology may be flawed in several ways, as @gbn and @srutzky have alluded to.

First, you need to ensure you're not testing SQL Server Management Studio (or whatever client you're using). For instance, if you are running a SELECT * from a table with 3 million rows, you're mostly testing SSMS's ability to pull rows from SQL Server and render them on-screen. You're far better off to use something like SELECT COUNT(1) which negates the need to pull millions of rows across the network, and render them on screen.

Second, you need to be aware of SQL Server's data cache. Typically, we test the speed of reading data from storage, and processing that data, from a cold-cache (i.e. SQL Server's buffers are empty). Occasionally, it makes sense to do all your testing with a warm-cache, but you need to approach your testing explicitly with that in mind.

For a cold-cache test, you need to run CHECKPOINT and DBCC DROPCLEANBUFFERS prior to each run of the test.

For the test you've asked about in your question, I created the following test-bed:

IF COALESCE(OBJECT_ID('tempdb..#SomeTest'), 0) <> 0
BEGIN
    DROP TABLE #SomeTest;
END
CREATE TABLE #SomeTest
(
    TestID INT NOT NULL
        PRIMARY KEY 
        IDENTITY(1,1)
    , A INT NOT NULL
    , B FLOAT NOT NULL
    , C MONEY NOT NULL
    , D BIGINT NOT NULL
);

INSERT INTO #SomeTest (A, B, C, D)
SELECT o1.object_id, o2.object_id, o3.object_id, o4.object_id
FROM sys.objects o1
    , sys.objects o2
    , sys.objects o3
    , sys.objects o4;

SELECT COUNT(1) 
FROM #SomeTest;

This returns a count of 260,144,641 on my machine.

To test the "addition" method, I run:

CHECKPOINT 5;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

SET STATISTICS IO, TIME ON;
GO
SELECT COUNT(1)
FROM #SomeTest st
WHERE (st.A + st.B + st.C + st.D) = 0;
GO
SET STATISTICS IO, TIME OFF;

The messages tab shows:

Table '#SomeTest'. Scan count 3, logical reads 1322661, physical reads 0, read-ahead reads 1313877, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 49047 ms, elapsed time = 173451 ms.

For the "discrete columns" test:

CHECKPOINT 5;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

SET STATISTICS IO, TIME ON;
GO
SELECT COUNT(1)
FROM #SomeTest st
WHERE st.A = 0
    AND st.B = 0
    AND st.C = 0
    AND st.D = 0;
GO

SET STATISTICS IO, TIME OFF;

again, from the messages tab:

Table '#SomeTest'. Scan count 3, logical reads 1322661, physical reads 0, read-ahead reads 1322661, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 8938 ms, elapsed time = 162581 ms.

From the stats above you can see the second variant, with the discrete columns compared to 0, the elapsed time is about 10 seconds shorter, and the CPU time is about 6 times less. The long durations in my tests above are mostly a result of reading a lot of rows from disk. If you drop the number of rows to 3 million, you see the ratios remain about the same but the elapsed times drop noticeably, since the disk I/O has much less of an effect.

With the "Addition" method:

Table '#SomeTest'. Scan count 3, logical reads 15255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 499 ms, elapsed time = 256 ms.

With the "discrete columns" method:

Table '#SomeTest'. Scan count 3, logical reads 15255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 94 ms, elapsed time = 53 ms.

What will make a really really big difference for this test? An appropriate index, such as:

CREATE INDEX IX_SomeTest ON #SomeTest(A, B, C, D);

The "addition" method:

Table '#SomeTest'. Scan count 3, logical reads 14235, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 546 ms, elapsed time = 314 ms.

The "discrete columns" method:

Table '#SomeTest'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

The execution plan for each query (with the above index in-place) is quite telling.

The "addition" method, which must perform a scan of the entire index:

enter image description here

and the "discrete columns" method, which can seek to the first row of the index where the leading index column, A, is zero:

enter image description here