Sql-server – Index lost while joining on union

execution-planperformancequery-performancesql serverunion

I read a lot about indexes/unions but can't get to the bottom of this.

I have here a (oversimplified for this question) query:

SELECT  Items.ItemID AS ID, Items.Comment, ItemOwners_1.OwnerID
FROM   Items INNER JOIN
    (SELECT  EventDate, ItemID, OwnerID
    FROM   (SELECT  EventDate, ItemID, OwnerID
           FROM   (SELECT  Sales.ProcessedOn AS EventDate, SaleProducts.ItemID, Addresses.PersonID AS OwnerID
                 FROM   SaleProducts INNER JOIN
                        Sales ON SaleProducts.SaleID = Sales.SaleID INNER JOIN
                        Addresses ON Sales.ShipAddressID = Addresses.AddressID

--start union
                 UNION ALL
                 SELECT  Sales.ProcessedOn AS EventDate, SaleProducts.ItemID, - 1 AS OwnerID
                 FROM   SaleProducts INNER JOIN
                       Sales ON SaleProducts.SaleID = Sales.SaleID
                 WHERE  (SaleProducts.ItemID IS NOT NULL) AND (Sales.ProcessedOn IS NOT NULL) AND (SaleProducts.IsReturn = 1)
-end union

) AS viwOwnerEvents2) AS t1) AS ItemOwners_1 ON 
   Items.ItemID = ItemOwners_1.ItemID
WHERE  (ItemOwners_1.OwnerID = 930)

This query as is shows disk I/O of 8 MB result (in supratimas).

If I take out the union it uses only 600KB, that's more than a 90% improvement!

The time for running the query is less than 1 second in either case, the difference is for queries based on top of this (and also the full query has a time difference too). Also the estimated rows are off (1 vs 35 etc..)

All columns mentioned here are indexed, the query advisor shows a 13% possible improvement if i add about 25 new indexes and statistics.

Does that mean all indexes are lost the minute I union? Or am I doing something wrong? Is there a way around it?

I am able to reproduce the issue with less code, here's an even simpler query:

--QUERY 1
SELECT * 
FROM  (SELECT *, 
          Row_number() 
            OVER ( 
              partition BY itemid 
              ORDER BY origdate DESC)AS RowNum 
   FROM   items)t1 
WHERE  origdate < Getdate() 
   AND orighand = 930 

--QUERY 2
SELECT *, 
   Row_number() 
     OVER ( 
       partition BY itemid 
       ORDER BY origdate DESC)AS RowNum 
FROM   items 
WHERE  origdate < Getdate() 
   AND orighand = 930 

The difference is in the nesting. The first query which has one level of nesting uses 18MB of I/O, whereas the second uses 7KB.

The nesting is a must, since as explained the rownumber actually runs over a much longer union query.

Does the fact that it's using 17MB show that my indexing is wrong? Or it's somehow not bubbling up correctly? (and perhaps also locking more rows than necessary)?

Or am I simply barking up the wrong tree? Should I only be concerned with the execution time of the end query (which is much more complex, around 7 seconds, 50 MB I/O).

Here is the execution plan: http://pastebin.com/1h9KWVr9

Here's the slow query (which is actually based on other queries a few levels deep). In the end, this is the query that I'm trying to optimize.

select * from localitemexts(0,getdate()) where holderid=930 or ownerid=930

and here's the execution plan (and yes it's huge) http://www.writeurl.com/text/ay83rltx0wqbtwjgh8xe/0ws9zeqd4yghn0d628b1

UPDATE 3
OK guys. So I made some homework and additional tests. First off, moving the where into the union made no difference in the disk IO, but I'm leaving it there for now, for testing.

Below (query 4) is a query closer to the actual use case. This has already 1.16MB IO, when i filter on the RowNum and HolderID (both coming from the union). But when I filter on the ProductID which is a standard column in the Items table, it uses only 250 KB. (the actual difference in result rows are 6 vs 14, not enough to explain the difference in IO)

So basically my question is like this:

As i extend the query with more joins and unions, the query becomes slower and slower, the question is where do I start debugging? should I grow the query till i see execution times of 3 seconds plus? and then start debugging? or is high IO supposed to raise a red flag right away that something is wrong with my indexing or query set up and I need to start fixing it right there? even though it runs i 0:00 seconds?

--QUERY 4
select * from items inner join (SELECT *, 
      Row_number() 
        OVER ( 
          partition BY itemid 
          ORDER BY origdate DESC)AS RowNum 
FROM   (select itemid,origdate,orighand as HolderID from items where origdate<getdate()
UNION ALL
SELECT       itemid,PurchasedOn,VendorID 
FROM            PurchaseItems inner join Purchases on PurchaseItems.PurchaseID=Purchases.PurchaseID 
where delivered=1 and IsReturn=1 and PurchasedOn<getdate())viwHolderEvents )       as  Holders on items.itemid=holders.itemid
WHERE productid=100--  rownum=1 and  HolderID  = 930 

As a test for outer apply I tried the following, the IO was worse, 18MB.

--QUERY 5 (outer apply)
select * from items outer apply(SELECT top 1* 
FROM   ( 
select itemid,origdate,orighand as HolderID from items where    origdate<getdate()
UNION ALL
SELECT       itemid,PurchasedOn,VendorID 
FROM            PurchaseItems inner join Purchases on PurchaseItems.PurchaseID=Purchases.PurchaseID 
where delivered=1 and IsReturn=1 and PurchasedOn<getdate())viwHolderEvents  where viwHolderEvents.itemid=items.ItemID   order  by origdate desc ) t1 where holderid=930

Best Answer

Try pushing the predicates as close to the base tables as possible:

SELECT * 
FROM  (SELECT items.*, Row_number() OVER (partition BY itemid 
                                          ORDER BY origdate DESC
                                         )AS RowNum 
      FROM   items 
      WHERE  origdate < Getdate() 
        AND orighand = 930
      ) t1 

Do the same for all legs of the union. If the predicates contain references to an an outer table as in:

SELECT ...
FROM T1
JOIN (
    SELECT ...
    FROM T2
    WHERE T2.x = 'WHATEVER'
) T3
    ON T3.y = T1.Y

You can use a LATERAL JOIN (CROSS APPLY in SQL-server) to be able to push this predicate inside the derived table:

SELECT ...
FROM T1
CROSS JOIN LATERAL (
    SELECT ...
    FROM T2
    WHERE T2.x = 'WHATEVER'
      AND T2.y = T1.Y
) T3