How to Get CROSS APPLY to Operate Row by Row on a View in SQL Server

cross-applyexecution-plansql serverview

We have a view that is optimized for single item queries (200ms no parallelism):

select * 
    from OptimizedForSingleObjectIdView e2i
   where ObjectId = 3374700

It also works on small sets of static ids (~5).

select * 
    from OptimizedForSingleObjectIdView e2i
   where ObjectId in (3374700, 3374710, 3374720, 3374730, 3374740);

However if the objects come from an external source, then it generates a slow plan. The execution plan shows that the execution branch for the view part is ignoring the predicate on ObjectId whereas in the original case it uses them to perform index seeks.

select v.*
  from 
     (
       select top 1 ObjectId from Objects
        where ObjectId % 10 = 0
        order by ObjectId
     ) o  
  join OptimizedForSingleObjectIdView v -- (also tried inner loop join)
    on v.ObjectId = o.ObjectId;

We don't wish to invest in "dual" optimizing the view for non-singular cases. Rather, the solution we "seek" is to repetitively call the view once per object without resorting to an SP.

Most of the time the following solution calls the view row by row. However not this time and not even for just 1 object:

select v.*
  from
     (
       select top 1 ObjectId 
         from Objects 
        where ObjectId % 10 = 0 -- non-trivial predicate
        order by ObjectId
     ) o
   cross apply
    (
      select top 2000000000 *
        from OptimizedForSingleObjectIdView v_
       where ObjectId = o.ObjectId 
       order by v_.SomeField
    ) v;

At one time I thought there was a claim that cross apply was guaranteed for row by row execution when it calls a UDF but this also failed:

create function FunctionCallingView(@pObjectId bigint)
returns table
as 
return select *
  from OptimizedForSingleObjectIdView 
 where ObjectId = @pObjectId;             

select v.*
  from
     (
       select top 1 ObjectId 
         from Objects 
        where ObjectId % 10 = 0
        order by ObjectId
     ) o
 cross apply FunctionCallingView(o.ObjectId) v

Adding option(force order) did not help–however there are two hash hints in the view already. Temporarily removing them did not help and slows down the single case.

Here is a snippet of the estimated plan for the function based slow case. The estimation of 1 row is correct. Far off to the right (not shown) is where there is a seek predicate that does not include that top 1 result. This seems similar to other cases we have where singular probed values from table seeks are not used as seek predicates elsewhere.

Best Answer

It is not possible to fully guarantee evaluating the view per row of the outer query, without using something that introduces a new T-SQL execution scope, for example a non-inline (multi-statement, BEGIN...END) table-valued function. This is pretty much the advice given in response to your previous question How to use merge hints to isolate complex queries in SQL Server.

At one time I thought there was a claim that cross apply was guaranteed for row by row execution when it calls a UDF

This does not apply to inline table-valued functions, since the definition is expanded into the calling query before optimization begins.

That said, there are some things you can do to strongly encourage the desired outcome.

The execution plan shows that the execution branch for the view part is ignoring the predicate on ObjectId whereas in the original case it uses them to perform index seeks.

You are expecting the ObjectId values to be evaluated "inside the view" using index seeks for each driving row. This is the correlated nested loops join (apply) style of execution. Note that using the APPLY T-SQL language element does not guarantee that physical execution will use the apply style.

It very much sounds as if SQL Server is choosing to execute with the ObjectId values tested at the Nested Loops Join operator instead. This is a non-correlated, or naive, nested loops join execution pattern.

Most likely, this is caused by the join hints you are using inside the view. Join hints are generally to be avoided, because they greatly restrict optimizer freedom, and not just for the physical type of join. In particular, join hints also force the order of joins for the whole query (just as if you had use a FORCE ORDER hint) and prevent several optimizations related to aggregation placement and strategy, and much else besides.

If you really must have join hints inside a view (something I would strongly suggest you avoid in general), you might find the most reliable way to get the plan shape you desire is:

Create an inline (RETURNS TABLE) function using the view definition (not referencing the view).
Provide @ObjectId as a parameter to the function.
Placing a predicate using the parameter within the function body in a way that makes an index seek more likely.
A FORCESEEK table hint inside the function may be used if truly every use should result in a seek.
Call the new inline function using APPLY.

I generally dislike using specific syntaxes and hints in an attempt to force a certain physical plan shape. You may have more general success by parameterizing queries and guaranteeing the plan shape using a plan guide.

Related Solutions

Sql-server – Huge cross product view query speed

It'll sound a little stupid, but creating a separate composite non clustered index on the attributes table on product id and each column, resulting in 150+ indexes on the attributes table, will get you the ultimate speed for that query because it will support the select distinct column value by product from the attributes table. Your insert performance will be super painful.

Using CROSS APPLY with GROUP BY and TOP 1 – Handling Duplicate Data

If you're looking for MAX(Modified) field over ProductNumber, you can use ROW_NUMBER() function, and then get all rows where row number = 1.

WITH selMax AS
(
    SELECT ID, ProductNumber, DateCreated, Modified,
           ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY Modified DESC, 
                                                                  DateCreated DESC) RNum
    FROM   #ProductStatus
)
SELECT ID, ProductNumber, DateCreated, Modified
FROM   selMax
WHERE  RNum = 1
GO

ID | ProductNumber | DateCreated         | Modified           
-: | ------------: | :------------------ | :------------------
11 |      20070098 | 20/03/2009 14:09:52 | 10/10/2014 20:22:59
 3 |      20070099 | 18/12/2008 09:26:58 | 10/12/2014 20:22:59

Filtering by ProductNumber:

WITH selMax AS
(
    SELECT ID, ProductNumber, DateCreated, Modified,
           ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY Modified DESC, 
                                                                  DateCreated DESC) RNum
    FROM   #ProductStatus
    WHERE  ProductNumber = 20070098
)
SELECT ID, ProductNumber, DateCreated, Modified
FROM   selMax
WHERE  RNum = 1
GO

ID | ProductNumber | DateCreated         | Modified           
-: | ------------: | :------------------ | :------------------
11 |      20070098 | 20/03/2009 14:09:52 | 10/10/2014 20:22:59

dbfiddle here

Best Answer

Related Solutions

Sql-server – Huge cross product view query speed

Using CROSS APPLY with GROUP BY and TOP 1 – Handling Duplicate Data

Related Question