SQL Server Query Performance – Comparing a=0 and b=0 vs a+b+c+d=0

performancequery-performancesql server

This is a simple question that I can't seem to find the answer for.

In terms of performance, If I have a WHERE clause such as a=0 and b=0 and ... z=0, Would I gain any performance if I replaced that condition with a+b+...+z=0?

In other words, is there any performance gain by replacing the following

Select * 
From MyTable 
Where A=0 and B=0 and C=0 and D=0...

With

Select * 
From MyTable 
Where A+B+C+D=0...

I know it can depend on indexes, but for this purpose, let's just say no indexes exist. Does the arithmetic operator (+) perform better than an "OR" or "AND" Logical Operator?

I'm under the impression that the addition performs better than multiple conditions with ANDs or ORs.

Test results

On a Table of 4.2 million rows

Returning rows Where A=0 B=0 and C=0 -> 351748 Rows

The addition (A+B+C=0) took 5 seconds while the Logical conditions A=0 and B=0 and C=0 took 11 seconds.

On the other hand

Returning rows Where A<>0 B<>0 or C<>0 -> 3829750 Rows 58 seconds

Returning rows Where F65+ F67+f64<>0 -> 3829750 Rows 57 seconds

For the OR, it seems that there is no significant difference.

I agree with gbn:

If A is -1 and B is 1, A+B=0 but A=0 and B= 0 is false

and with AMtwo:

ABS(A)+ABS(B)+ABS(C)+ABS(D)… Even if you expect only positive values, if the column accepts negative values, you should assume that you might encounter one

The results are very impressive, as I thought, It seems that the addition is much quicker than the logical operators.

A = Float, B = Money and C = Float. The query used is as shown. In my case, all are positive numbers. No indexes. It is just logical in my mind that addition would be quicker than Logical conditions!

Best Answer

In your question, you detail some tests that you've prepared where you "prove" that the addition option is quicker than comparing the discrete columns. I suspect your test methodology may be flawed in several ways, as @gbn and @srutzky have alluded to.

First, you need to ensure you're not testing SQL Server Management Studio (or whatever client you're using). For instance, if you are running a SELECT * from a table with 3 million rows, you're mostly testing SSMS's ability to pull rows from SQL Server and render them on-screen. You're far better off to use something like SELECT COUNT(1) which negates the need to pull millions of rows across the network, and render them on screen.

Second, you need to be aware of SQL Server's data cache. Typically, we test the speed of reading data from storage, and processing that data, from a cold-cache (i.e. SQL Server's buffers are empty). Occasionally, it makes sense to do all your testing with a warm-cache, but you need to approach your testing explicitly with that in mind.

For a cold-cache test, you need to run CHECKPOINT and DBCC DROPCLEANBUFFERS prior to each run of the test.

For the test you've asked about in your question, I created the following test-bed:

IF COALESCE(OBJECT_ID('tempdb..#SomeTest'), 0) <> 0
BEGIN
    DROP TABLE #SomeTest;
END
CREATE TABLE #SomeTest
(
    TestID INT NOT NULL
        PRIMARY KEY 
        IDENTITY(1,1)
    , A INT NOT NULL
    , B FLOAT NOT NULL
    , C MONEY NOT NULL
    , D BIGINT NOT NULL
);

INSERT INTO #SomeTest (A, B, C, D)
SELECT o1.object_id, o2.object_id, o3.object_id, o4.object_id
FROM sys.objects o1
    , sys.objects o2
    , sys.objects o3
    , sys.objects o4;

SELECT COUNT(1) 
FROM #SomeTest;

This returns a count of 260,144,641 on my machine.

To test the "addition" method, I run:

CHECKPOINT 5;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

SET STATISTICS IO, TIME ON;
GO
SELECT COUNT(1)
FROM #SomeTest st
WHERE (st.A + st.B + st.C + st.D) = 0;
GO
SET STATISTICS IO, TIME OFF;

The messages tab shows:

Table '#SomeTest'. Scan count 3, logical reads 1322661, physical reads 0, read-ahead reads 1313877, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 49047 ms, elapsed time = 173451 ms.

For the "discrete columns" test:

CHECKPOINT 5;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

SET STATISTICS IO, TIME ON;
GO
SELECT COUNT(1)
FROM #SomeTest st
WHERE st.A = 0
    AND st.B = 0
    AND st.C = 0
    AND st.D = 0;
GO

SET STATISTICS IO, TIME OFF;

again, from the messages tab:

Table '#SomeTest'. Scan count 3, logical reads 1322661, physical reads 0, read-ahead reads 1322661, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 8938 ms, elapsed time = 162581 ms.

From the stats above you can see the second variant, with the discrete columns compared to 0, the elapsed time is about 10 seconds shorter, and the CPU time is about 6 times less. The long durations in my tests above are mostly a result of reading a lot of rows from disk. If you drop the number of rows to 3 million, you see the ratios remain about the same but the elapsed times drop noticeably, since the disk I/O has much less of an effect.

With the "Addition" method:

Table '#SomeTest'. Scan count 3, logical reads 15255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 499 ms, elapsed time = 256 ms.

With the "discrete columns" method:

Table '#SomeTest'. Scan count 3, logical reads 15255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 94 ms, elapsed time = 53 ms.

What will make a really really big difference for this test? An appropriate index, such as:

CREATE INDEX IX_SomeTest ON #SomeTest(A, B, C, D);

The "addition" method:

Table '#SomeTest'. Scan count 3, logical reads 14235, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 546 ms, elapsed time = 314 ms.

The "discrete columns" method:

Table '#SomeTest'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

The execution plan for each query (with the above index in-place) is quite telling.

The "addition" method, which must perform a scan of the entire index:

and the "discrete columns" method, which can seek to the first row of the index where the leading index column, A, is zero:

Related Solutions

Sql-server – What can be done to further enhance performance of Multiple Join and Aggregate Queries

There is rarely any need, point or benefit trying to micro optimise star schema queries with non-clustered indexes laden with included columns. Fact tables are built to be scanned.

The indexes you've created in your examples are subset copies of the parent table, which are being scanned (no seeks). The minor performance improvements come from scanning marginally fewer pages than the parent table. Given that star schemas are built to support ad-hoc query patterns it is not viable to create the indexes to support every possible enquiry.

Create your fact table clustered index on the date key. The majority of (typical) fact table queries include a time element and clustering on the date key enables range scanning of fact table rows.
Add non-clustered indexes on the foreign keys of your fact tables to assist with highly selective queries. Foreign keys to dimension tables can be created with NOCHECK to prevent any impact on ETL.
Cluster your dimension tables on their surrogate keys.
Create a non-clustered index on the natural key of each dimension table.
Stop.

The optimiser detects star schema query patterns and has strategies to deal with them efficiently, utilising scans and hash joins in Standard Edition or bitmap filtering in Enterprise. Follow the indexing strategy outlined above and let the optimiser deal with the rest.

SQL Server Query Performance – Optimizing Group By with Many Columns

The non-clustered index you have tested is not the best for this query. It can be used for the WHERE clause and for doing an index scan instead of a full table scan but it cannot be used for the GROUP BY.

The best possible index would have to be a partial index (to filter the unwanted rows from the WHERE clause), then have all the columns used in the GROUP BY and then INCLUDE all the other columns used in the SELECT:

CREATE INDEX special_ix 
  ON dbo.Commissions_Output
    ( company, location, account, 
      salesroute, employee, producttype, 
      item, loadjdate, commissionrate ) 
INCLUDE 
  ( [Extended Sales Price], [Delivered Qty] ) 
WHERE 
  ( [Extended Sales Price] <> 0 ) ;