How to achieve predicate pushdown in the view

optimizationquery-performancesql-server-2019

I have a reporting table (about 1bn rows), and a tiny dimension table:

CREATE TABLE dbo.Sales_unpartitioned (
    BusinessUnit    int NOT NULL,
    [Date]          date NOT NULL,
    SKU             varchar(8) NOT NULL,
    Quantity        numeric(10, 2) NOT NULL,
    Amount          numeric(10, 2) NOT NULL,
    CONSTRAINT PK_Sales_unpartitioned PRIMARY KEY CLUSTERED (BusinessUnit, [Date], SKU)
);

--- Demo data:
INSERT INTO dbo.Sales_unpartitioned
SELECT severity AS BusinessUnit,
       DATEADD(day, message_id, '2000-01-01') AS [Date],
       LEFT([text], 3) AS SKU,
       1000.*RAND(CHECKSUM(NEWID())) AS Quantity,
       10000.*RAND(CHECKSUM(NEWID())) AS Amount
FROM sys.messages
WHERE [language_id]=1033;

--- Artificially inflate statistics of demo data:
UPDATE STATISTICS dbo.Sales_unpartitioned WITH ROWCOUNT=1000000000;

--- Dimension table:
CREATE TABLE dbo.BusinessUnits (
    BusinessUnit    int NOT NULL,
    SalesManager    nvarchar(250) NULL,
    PRIMARY KEY CLUSTERED (BusinessUnit)
);

INSERT INTO dbo.BusinessUnits (BusinessUnit)
SELECT DISTINCT BusinessUnit FROM dbo.Sales;

… to which I've added a reporting view used by an application for OLTP-style reporting.

CREATE OR ALTER VIEW dbo.SalesReport_unpartitioned
AS

SELECT bu.BusinessUnit,
       s.[Date],
       s.SKU,
       s.Quantity,
       s.Amount
FROM dbo.BusinessUnits AS bu
CROSS APPLY (
    --- Regular sales
    SELECT t.BusinessUnit, t.[Date], t.SKU, t.Quantity, t.Amount
    FROM dbo.Sales_unpartitioned AS t
    WHERE t.BusinessUnit=bu.BusinessUnit
      AND t.SKU LIKE 'T%'

    UNION ALL

    --- This is a special reporting entry. We only
    --- want to see today's row. In case of duplicates,
    --- get the row with the first "SKU".
    SELECT TOP (1) s.BusinessUnit, s.[Date], s.SKU, s.Quantity, s.Amount
    FROM dbo.Sales_unpartitioned AS s
    WHERE s.BusinessUnit=bu.BusinessUnit
      AND s.[Date]=CAST(SYSDATETIME() AS date)
      AND s.SKU LIKE 'S%'
    ORDER BY s.BusinessUnit, s.[Date], s.SKU
) AS s

The idea is that the user application will query this view with a SELECT query that filters on a range of dates and one or more BusinessUnits. For this purpose, I've chosen a CROSS APPLY pattern, so that the query can "loop" over each BusinessUnit, seek to a range of Date, and apply a residual filter on SKU.

Example app query:

DECLARE @from date='2021-01-01', @to date='2021-12-31';

SELECT *
FROM dbo.SalesReport_unpartitioned
WHERE BusinessUnit=16
  AND [Date] BETWEEN @from AND @to
ORDER BY BusinessUnit, [Date], SKU;

I would expect a query plan that looks like this:
Desired query plan with predicate pushdown and no Filter operator
Desired plan

However, the plan turns out like this:
Actual query plan seeks only on BusinessUnit with residual predicate on SKU, adds Filter to the end of the plan
Actual plan

I expected SQL Server to do a "predicate pushdown" on the Date column, allowing the Clustered Index Seek to look for a single BusinessUnit and a range of Date, then apply a residual predicate on SKU. This works on the Seek in the "s" branch (the one with TOP) – probably because it has a hard-coded Date predicate in the query – but not on the "t" branch.

However, on the "t" branch SQL Server only seeks to the specific BusinessUnit with a residual predicate on SKU, effectively retrieving all dates. Only at the end of the plan does it applies a Filter operator that filters on the Date column.

In a large table, this has a very significant performance penalty – you could end up reading 20 years of data from disk when all you're looking for is a week.

Things I've tried

Workarounds:

  • Converting the view to an inline table valued function with @fromDate and @toDate parameters that filter the "s" and "t" queries will enable a Seek on (BusinessUnit, Date) as desired, but requires rewriting the app code.
  • Moving the UNION ALL out of the CROSS APPLY (from CROSS APPLY (UNION) to CROSS APPLY() UNION CROSS APPLY()) will enable predicate pushdown. It makes one more seek on the BusinessUnit table, which is perfectly acceptable.

Fixes the Seek, but changes the results:

  • Surprisingly, removing the TOP (1) and ORDER BY for the "s" query makes predicate pushdown work on "t", but can give return too many rows from "s".
  • Eliminating UNION ALL by either removing the "s" or "t" query will enable predicate pushdown, but generate incorrect results.

No change or not feasible:

  • Replacing TOP (1) with a ROW_NUMBER() pattern does not change the Seek.
  • Changing the CROSS APPLY to a forced INNER LOOP JOIN fixes the Seek on "t", but actually changes "s" to a Scan instead, which is even worse.
  • Adding trace flag 8780 to allow the optimizer to work on a plan for longer does not change anything. The plan is already optimized FULL with no early termination.

A common thread seems to be that changing/simplifying the "s" query (removing TOP, ORDER BY) fixes the problem on the "t" query, which feels counter-intuitive to me.

What I'm looking for

I'm trying to understand if this is a shortcoming of the optimizer, if it's the result of a deliberate costing/optimization mechanism, or if I've simply overlooked something.

Best Answer

I'm trying to understand if this is a shortcoming of the optimizer, if it's the result of a deliberate costing/optimization mechanism, or if I've simply overlooked something.

It's a little bit of all of those.

There's a lot going on in the query presented — too much really — so to avoid writing half a book about it, I am going to boil it down to the main element that is causing you not to get the plan you are after:

The optimizer does not push predicates down the inner side of an apply.

The rule that operates on relational selections (filters, predicates) above an apply is called, naturally enough, SELonApply. It performs the following logical substitution:

Sel (A Apply B) -> Sel (Sel A Apply B)

It takes part(s) of a potentially complex selection involving both A and B, and pushes those it can down to the driving table A. No part of the selection is pushed to B. The part(s) of the selection that cannot be pushed down remain behind.


This might sound like a shocking oversight, and counter to experience. That's because it is not the full story.

The optimizer tries to convert an apply to the equivalent join early on in the compilation process (during simplification, before trivial plan and cost-based optimization). It is capable of pushing selections down either side of a join, where it is safe. That join may in turn be transformed into a physical apply during cost-based optimization.

The effect of all this is to make it seem like the optimizer pushed a predicate down the inner side of an apply:

  1. Written apply transformed to a join.
  2. Predicate(s) pushed down either side of the join.
  3. Join transformed to an apply.

Let me show you an example:

DECLARE @T1 table (pk integer PRIMARY KEY, c1 integer NOT NULL INDEX ic1);
DECLARE @T2 table (fk integer NOT NULL, c2 integer NOT NULL, PRIMARY KEY (fk, c2));

SELECT 
    T1.*,
    T2.*
FROM @T1 AS T1
CROSS APPLY 
(
    SELECT T2.* 
    FROM @T2 AS T2
    WHERE T2.fk = T1.pk
) AS T2
WHERE 
    1 = 1
    AND T1.c1 = 1
    AND T2.c2 = 2;

plan with predicates pushed down

If you look carefully at the plan, you will see the predicate on T2 pushed to the inner side seek, and the nested loop join is an apply (it has outer references). This was only possible because the optimizer was able to rewrite the apply as a join initially, push the predicates, then transform back to an apply later on.

We can disable the apply-to-join transformation using undocumented trace flag 9114:

DECLARE @T1 table (pk integer PRIMARY KEY, c1 integer NOT NULL INDEX ic1);
DECLARE @T2 table (fk integer NOT NULL, c2 integer NOT NULL, PRIMARY KEY (fk, c2));

SELECT 
    T1.*,
    T2.*
FROM @T1 AS T1
CROSS APPLY 
(
    SELECT T2.* 
    FROM @T2 AS T2
    WHERE T2.fk = T1.pk
) AS T2
WHERE 
    1 = 1
    AND T1.c1 = 1
    AND T2.c2 = 2
OPTION (QUERYTRACEON 9114);

This means only SELonApply can be used, which only pushes to the driving table A:

plan with stuck predicate

Notice the part of the selection on T2.c2 is 'stuck' above the apply, in a filter. The inner side seek is only on the FK/PK equality specified inside the apply.


The optimizer is built on relational principles. It appreciates a relational schema design, and queries that use relational constructs. Apply (lateral join) is a relatively new extension. The optimizer knows a lot more tricks with join than it does with apply, hence the early effort to rewrite.

When you use things like apply, or the (non-relational) Top, you are implicitly taking more responsibility for the final plan shape. In other words, you will more often have to express your query differently (as in your workaround) to get a good outcome.


My preference would be to use the inline table-valued function with explicit predicate placement. If I were to rewrite the view, I might go with:

CREATE OR ALTER VIEW dbo.SalesReport_unpartitioned
AS
--- Regular sales
SELECT
    BU.BusinessUnit,
    RS.[Date],
    RS.SKU,
    RS.Quantity,
    RS.Amount
FROM dbo.BusinessUnits AS BU
JOIN dbo.Sales_unpartitioned AS RS
    ON RS.BusinessUnit = BU.BusinessUnit
WHERE 
    RS.SKU LIKE 'T%'

UNION ALL

--- This is a special reporting entry.
SELECT
    BU.BusinessUnit,
    SR.[Date],
    SR.SKU,
    SR.Quantity,
    SR.Amount
FROM dbo.BusinessUnits AS BU
JOIN dbo.Sales_unpartitioned AS SR
    ON SR.BusinessUnit = BU.BusinessUnit
WHERE
    1 = 1
    AND SR.SKU LIKE 'S%'
    --- We only want to see today's row.
    AND SR.[Date] = CONVERT(date, SYSDATETIME())
    --- In case of duplicates, get the row with the first "SKU".
    AND SR.SKU =
    (
        SELECT 
            MIN(SR2.SKU) 
        FROM dbo.Sales_unpartitioned AS SR2
        WHERE 
            SR2.BusinessUnit = SR.BusinessUnit
            AND SR2.[Date] = SR.[Date]
            AND SR2.SKU LIKE 'S%'
    );
GO

For the provided test query:

DECLARE @from date='2021-01-01', @to date='2021-12-31';

SELECT *
FROM dbo.SalesReport_unpartitioned
WHERE BusinessUnit=16
  AND [Date] BETWEEN @from AND @to
ORDER BY BusinessUnit, [Date], SKU;

The execution plan is:

plan for view rewrite

The orange section is regular sales. The yellow section is for the special reporting entry.