Sql-server – Why does adding a computed column prevent predicate pushdown

computed-columnindexsql serversql-server-2017

I have a weird situation that I don't quite understand.

I have a table like this kinda:

CREATE TABLE dbo.cc_demo
     (
         id INT IDENTITY PRIMARY KEY,
         up_action INT,
         down_action INT,
         last_action_date DATETIME
     );

INSERT dbo.cc_demo ( up_action, down_action, last_action_date )
SELECT TOP 5000000
       nums.num % 500000, nums.num % 500000, DATEADD(MINUTE, nums.num, GETDATE())
FROM
       (   SELECT     ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )) AS num
           FROM       master..spt_values AS sv
           CROSS JOIN master..spt_values AS sv2 ) AS nums;

If I run this query, the plan is just a regular clustered index scan.

SELECT *
FROM   dbo.cc_demo AS cd
WHERE  cd.last_action_date >= '20270601'
AND    cd.last_action_date < '20270901';

yeah

ohyeah

But if I add a computed column and index it, my plan changes for the worst.

ALTER TABLE dbo.cc_demo
    ADD total_actions AS up_action + down_action;

CREATE INDEX ix_total_actions ON dbo.cc_demo (total_actions);

Now it scans the clustered index and then filters stuff out way later on!

dawg

bruh

It asks for an index, and that changes the plan back to what I'd expect, but why do I have to add this?

CREATE NONCLUSTERED INDEX [WORLDSTAR]
    ON dbo.cc_demo ( last_action_date )
    INCLUDE ( up_action, down_action, total_actions );

That doesn't seem logical.

Performance:

Pre-computed column:

Table 'cc_demo'. Scan count 1, logical reads 17990, 

 SQL Server Execution Times:
   CPU time = 234 ms,  elapsed time = 224 ms.

Post computed column + index:

Table 'cc_demo'. Scan count 1, logical reads 17990

 SQL Server Execution Times:
   CPU time = 594 ms,  elapsed time = 591 ms.

Here's the breakdown:

  • When I add the computed column, the plan doesn't change
  • When I index the computed column, it does
  • I'm not too worried about the performance yet
  • I'm just curious why the plan changes this way after I index the computed column

Best Answer

I'm just curious why the plan changes this way after I index the computed column

It's not indexing the computed column that really matters; any additional non-covering index can produce the same effect (so long as the table occupies more than one page). With only a clustered index present (or a fully covering nonclustered index), the optimizer can produce a TRIVIAL plan, with the filtering condition fully pushed down.

When there are non-trivial access method choices to make, the query goes through cost-based optimization. Cost-based optimization contains more complex and powerful index-matching rules, as you might expect, but the presence of multiple compute scalars due to computed column expansion and projection can prevent predicates (filters) being pushed down the execution plan, when the computed column is not PERSISTED.

SQL Server tries to push predicates (filters) as far down the logical query tree as it can early in the compilation process, but it can be quite conservative about doing this when non-persisted computed columns are involved. A filter that is not pushed down early is unlikely to be pushed further during cost-based optimization.

Simplified Demo

DROP TABLE IF EXISTS #CC;

CREATE TABLE #CC
(
    id integer IDENTITY PRIMARY KEY,
    c0 integer NULL,
    c1 integer NULL,
    c2 integer NULL,
    c3 AS c1 + c2
);

-- Just enough rows to fill more than one page
-- (so not all data access is trivial)
INSERT #CC 
    (c0, c1, c2)
SELECT 
    SV.number, SV.number, SV.number
FROM master.dbo.spt_values AS SV
WHERE 
    SV.[type] = N'P'
    AND SV.number BETWEEN 1 AND 324;

-- Trivial plan (only a clustered index to choose from)
SELECT * FROM #CC AS C WHERE C.c0 >= 1;

-- Add a non-covering index *not* on the computed column
CREATE INDEX ic2 ON #CC (c2);

-- Filter not pushed
SELECT * FROM #CC AS C WHERE C.c0 >= 1;

Workarounds

There are several ways to workaround this limitation in your example:

  1. Do not project the total_actions column. If the column is not needed, computations to derive its value cannot get in the way of predicate pushing.

  2. Make the computed column PERSISTED. This allows the optimizer consider plans that read the persisted value from the base table, avoiding the computation.

  3. Specify the clustered index using a hint e.g. WITH (INDEX(1)). This allows a trivial plan, since it removes the question of access method choice.

  4. Specify the index on the computed column using a (INDEX(ix_total_actions)) hint. While not covering (so no trivial plan), this hint also allows the column to come from storage rather than being computed. The index does not include last_action_date, so the predicate is applied in a Key Lookup. This plan may therefore be quite inefficient.

  5. ...etc.