Allow predicate pushing on a view that uses group by

join;predicatequery-performancesql server

We have a table Ecom.McProductToVendorProductCodeMap that has a multi-field PK as shown:

enter image description here

Then a view wraps that table to compute a metric and is grouped by the first two fields of that PK:

ALTER view ECom.McProductToVendorProductMd5SourceView
as
select ClientAppPrivateLabelId,
       BrandId, 
       convert(nvarchar(32), HashBytes('MD5', 
              string_agg(
                  convert(varchar(max), MaterialNumber + ',' + VendorProductCode + ',' + convert(varchar(30), VendorProductStatusId)),    -- sense any MaterialNumber/VendorProductCode/Status changes
                  ',') within group (order by MaterialNumber)
          ), 2) as Md5,
       Count(*) as Count,
       max(ModifiedUtc) as ModifiedUtc
  from ECom.McProductToVendorProductCodeMap 
 group by ClientAppPrivateLabelId, BrandId

Now if we select against the view using those 2 fields as predicates directly, then an index seek occurs using those 2 fields (19k rows and tool tip shows "Seek Predicate" on the 2 fields):

select * from ECom.McProductToVendorProductMd5SourceView
where ClientAppPrivateLabelId = 101 and BRandId = 3  

enter image description here

However when trying to join against that same view using the same 2 predicates it only seeks on ClientAppPrivateLabelId and not BrandId . The loop join hint did not help and replacing the join with a cross apply also did not help.

select IsNull(convert(smallint, Value), 0) as BrandId 
  into #Brands 
  from string_split('2,3', ',');    
    
select ClientAppPrivateLabelId, b.BrandId, Md5, Count, ModifiedUtc
  from #Brands b
 inner loop join ECom.McProductToVendorProductMd5SourceView m
    on m.BrandId = b.BrandId
   and m.ClientAppPrivateLabelId = 101;

enter image description here

The view is simple except for windowing calc:

ALTER view ECom.McProductToVendorProductMd5SourceView
as
select ClientAppPrivateLabelId,
       BrandId, 
       convert(nvarchar(32), HashBytes('MD5', 
              string_agg(
                  convert(varchar(max), MaterialNumber + ',' + VendorProductCode + ',' + convert(varchar(30), VendorProductStatusId)),    -- sense any MaterialNumber/VendorProductCode/Status changes
                  ',') within group (order by MaterialNumber)
          ), 2) as Md5,
       Count(*) as Count,
       max(ModifiedUtc) as ModifiedUtc
  from ECom.McProductToVendorProductCodeMap 
 group by ClientAppPrivateLabelId, BrandId

Why won't it use BrandId? The original table defines BrandId as a smallint non-nullable.

Paste-The-Plan:
https://www.brentozar.com/pastetheplan/?id=ryZWp86Hp

Update #1 (12/5/2023)

Converted the view to a Table Valued Function (TVF):

alter function ECom.McProductToVendorProductMd5(
   @pBrandId smallint,
   @pClientAppPrivateLabelId smallint
)
returns table as 
return
select ClientAppPrivateLabelId,
       BrandId, 
       convert(nvarchar(32), HashBytes('MD5', 
              string_agg(
                  -- Sense any MaterialNumber/VendorProductCode/Status changes
                  convert(varchar(max), MaterialNumber + ',' + VendorProductCode + ',' + convert(varchar(30), VendorProductStatusId)),    
                  ',') within group (order by MaterialNumber)
          ), 2) as Md5,
       Count(*) as Count,
       max(ModifiedUtc) as ModifiedUtc
  from ECom.McProductToVendorProductCodeMap m
 where m.BrandId = @pBrandId
   and m.ClientAppPrivateLabelId = @pClientAppPrivateLabelId
 group by ClientAppPrivateLabelId, BrandId

and Adjusted the query to use it via cross apply:

select ClientAppPrivateLabelId, b.BrandId, Md5, Count, ModifiedUtc
  from #Brands b
 cross apply ECom.McProductToVendorProductMd5(b.BrandId, @pCaplId) m;

Same problem: https://www.brentozar.com/pastetheplan/?id=SJnRODaBT

It is using a merge join and not seeking on BrandId

Best Answer

SQL Server is very keen to rewrite apply as a join before optimization begins. It is quite good at it too. It is less good at transforming a join to an apply, which is what you want here.

As a result, when you write a join, it stays as a join. When you write an apply, it gets transformed to a join.

There is no hint to avoid the initial rewrite from apply to join, though undocumented trace flag 9114 performs this function. Previous behaviours enabled by undocumented trace flags like this have eventually surfaced as USE HINT options, so perhaps this situation will change one day.

To work around this in the meantime, write the join as an apply and use an OUTER APPLY or redundant OFFSET to prevent the optimizer transforming the apply to a join.

SQL Server is capable of rewriting outer apply and OFFSET/TOP to a join in principle. It does not do this with OFFSET/TOP specifically because people have used this to avoid transformation to a join so often in the past. Outer apply is less amenable to transformation, but it can happen.

Outer apply

SELECT
    MA.ClientAppPrivateLabelId,
    B.BrandId,
    MA.Md5,
    MA.[Count],
    MA.ModifiedUtc
FROM #Brands AS B
OUTER APPLY
(
    SELECT
        M.* 
    FROM ECom.McProductToVendorProductMd5SourceView AS M
    WHERE
        M.ClientAppPrivateLabelId = 101
        AND M.BrandId = B.BrandId
) AS MA;

outer apply plan

Redundant offset

SELECT
    MA.ClientAppPrivateLabelId,
    B.BrandId,
    MA.Md5,
    MA.[Count],
    MA.ModifiedUtc
FROM #Brands AS B
CROSS APPLY
(
    SELECT
        M.* 
    FROM ECom.McProductToVendorProductMd5SourceView AS M
    WHERE
        M.ClientAppPrivateLabelId = 101
        AND M.BrandId = B.BrandId
    ORDER BY
        M.ClientAppPrivateLabelId,
        M.BrandId
        OFFSET 0 ROWS
) AS MA;

offset plan


If you want to encapsulate that inside your function, one possible implementation is:

CREATE OR ALTER FUNCTION ECom.McProductToVendorProductMd5
(
    @pBrandId integer,
    @pClientAppPrivateLabelId integer
)
RETURNS table
AS
RETURN 
    SELECT
        M.ClientAppPrivateLabelId,
        M.BrandId,
        Md5 = 
            CONVERT(char(32),
                HASHBYTES(N'MD5',
                    STRING_AGG(CSV.cols, ',')
                        WITHIN GROUP (ORDER BY M.MaterialNumber)), 
                2),
        [Count] = COUNT_BIG(*),
        ModifiedUtc = MAX(M.ModifiedUtc)
    FROM 
        ECom.McProductToVendorProductCodeMap AS M
    CROSS APPLY 
    (
        VALUES
        (
            CONVERT(varchar(max),
                CONCAT_WS(',', M.MaterialNumber, M.VendorProductCode, M.VendorProductStatusId))
        )
    ) AS CSV (cols)
    WHERE
        M.BrandId = @pBrandId
        AND M.ClientAppPrivateLabelId = @pClientAppPrivateLabelId
    GROUP BY
        M.ClientAppPrivateLabelId,
        M.BrandId
    ORDER BY
        M.ClientAppPrivateLabelId
        OFFSET 0 ROWS;