Sql-server – SQL Server 2014 – Compute scalar over computed indexed column

sql serversql server 2014

I've created a table a short while ago, and it currently handles roughly 100M rows.

For reporting purposes, I added a computed, non-persisted column which is the HASHBYTES in SHA1 of 4 non-NULLable varchar columns in this table. I'm not using a scalar function for this : it was such as :

    ALTER TABLE myTable
    ADD myHash AS(CAST(HASHBYTES(SHA1, myString1, myString2, myString3, myString4) AS varbinary(20)))

Then I added an index on it, so as to be able to perform a self-join on this computed column, as such :

    CREATE INDEX IX
    ON myTable (myHash)

The query plan of the report query shows a Compute Scalar operator that does the HASHBYTES computation. I wonder why this happens, since this value is stored in the index; the report query only INNER JOINs once on this computed column (i.e. it does not appear in any other clause, nor between the SELECT and the FROM clauses).

So I persisted the column, but the plan is identical. Can someone explain this behavior please ?

Best Answer

I believe the "computer scalar" operator in the plan where it is using the index is actually not being executed. On my test rig with 1,000,000 sample rows, which is shown below, the query without the non-clustered index is a lot slower than the query that uses the clustered index on the myhash column.

USE tempdb;
SET NOCOUNT ON;

IF EXISTS (SELECT 1
    FROM sys.objects o
    WHERE o.name = 'HTest'
        AND o.type = 'U')
DROP TABLE dbo.HTest;

CREATE TABLE dbo.HTest
(
    HTest_ID INT NOT NULL
        CONSTRAINT PK_HTest_ID
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
    , V1 VARCHAR(255) NOT NULL
    , V2 VARCHAR(255) NOT NULL
    , V3 VARCHAR(255) NOT NULL
    , V4 VARCHAR(255) NOT NULL
);

INSERT INTO dbo.HTest (V1, V2, V3, V4)
SELECT TOP(1000000) 
    o1.name
    , o2.name
    , o3.name
    , o4.name
FROM sys.objects o1
    , sys.objects o2
    , sys.objects o3
    , sys.objects o4;

ALTER TABLE dbo.HTest
ADD MyHash AS(CAST(HASHBYTES('SHA1', V1 + V2 + V3 + V4) AS VARBINARY(20)));

Below we have two test runs, one without the index, and the 2nd one with an index on the MyHash column.

IF EXISTS (SELECT 1
    FROM sys.indexes i
    WHERE i.name = 'IX_Htest_MyHash'
)
DROP INDEX IX_Htest_MyHash
ON dbo.HTest;

PRINT (N'');
PRINT (N'-----set stats io on---------------------------------------------------');
PRINT (N'');

SET STATISTICS IO, TIME ON;

PRINT (N'');
PRINT (N'-----run 1 (no index)--------------------------------------------------');
PRINT (N'');

SELECT MyHash
FROM dbo.HTest;

PRINT (N'');
PRINT (N'-----end of run 1------------------------------------------------------');
PRINT (N'');

CREATE INDEX IX_Htest_MyHash
ON dbo.HTest(MyHash);

PRINT (N'');
PRINT (N'-----run 2 (with index)------------------------------------------------');
PRINT (N'');

SELECT MyHash
FROM dbo.HTest;

PRINT (N'');
PRINT (N'-----end of run 2------------------------------------------------------');
PRINT (N'');

SET STATISTICS IO, TIME OFF

The salient bits from the output of this is:

-----run 1 (no index)--------------------------------------------------

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 1 ms.
Table 'HTest'. Scan count 1, logical reads 9409, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1560 ms,  elapsed time = 7516 ms.

-----end of run 1------------------------------------------------------

-----run 2 (with index)------------------------------------------------

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
Table 'HTest'. Scan count 1, logical reads 4227, physical reads 0, read-ahead reads 8, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 188 ms,  elapsed time = 4912 ms.

-----end of run 2------------------------------------------------------

As you can see, run 2 is clocking in with far less CPU time, and around half the reads of run 1.

The plan for run 1:

enter image description here

The plan for run 2:

enter image description here

Looking at the XML for the 2nd plan, you can see the <ComputeScalar> operator is actually just a lookup. Look at the 5th line <ScalarOperator ScalarString="[tempdb].[dbo].[HTest].[MyHash]">:

  <ComputeScalar>
    <DefinedValues>
      <DefinedValue>
        <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[HTest]" Column="MyHash" ComputedColumn="true" />
        <ScalarOperator ScalarString="[tempdb].[dbo].[HTest].[MyHash]">
          <Identifier>
            <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[HTest]" Column="MyHash" ComputedColumn="true" />
          </Identifier>
        </ScalarOperator>
      </DefinedValue>
    </DefinedValues>
    <RelOp AvgRowSize="21" EstimateCPU="1.10016" EstimateIO="3.11572" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1000000" LogicalOp="Index Scan" NodeId="1" Parallel="false" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="4.21587" TableCardinality="1000000">
      <OutputList>
        <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[HTest]" Column="MyHash" ComputedColumn="true" />
      </OutputList>
      <RunTimeInformation>
        <RunTimeCountersPerThread Thread="0" ActualRows="1000000" ActualRowsRead="1000000" ActualEndOfScans="1" ActualExecutions="1" />
      </RunTimeInformation>
      <IndexScan Ordered="false" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
        <DefinedValues>
          <DefinedValue>
            <ColumnReference Database="[tempdb]" Schema="[dbo]" Table="[HTest]" Column="MyHash" ComputedColumn="true" />
          </DefinedValue>
        </DefinedValues>
        <Object Database="[tempdb]" Schema="[dbo]" Table="[HTest]" Index="[IX_Htest_MyHash]" IndexKind="NonClustered" />
      </IndexScan>
    </RelOp>
  </ComputeScalar>

Presumably, the plan for the 2nd query includes the base calculation for the computed column, even though the results actually come from the index without any calculations actually taking place.