Sql-server – Index lost while joining on union

execution-planperformancequery-performancesql serverunion

I read a lot about indexes/unions but can't get to the bottom of this.

I have here a (oversimplified for this question) query:

SELECT  Items.ItemID AS ID, Items.Comment, ItemOwners_1.OwnerID
FROM   Items INNER JOIN
    (SELECT  EventDate, ItemID, OwnerID
    FROM   (SELECT  EventDate, ItemID, OwnerID
           FROM   (SELECT  Sales.ProcessedOn AS EventDate, SaleProducts.ItemID, Addresses.PersonID AS OwnerID
                 FROM   SaleProducts INNER JOIN
                        Sales ON SaleProducts.SaleID = Sales.SaleID INNER JOIN
                        Addresses ON Sales.ShipAddressID = Addresses.AddressID

--start union
                 UNION ALL
                 SELECT  Sales.ProcessedOn AS EventDate, SaleProducts.ItemID, - 1 AS OwnerID
                 FROM   SaleProducts INNER JOIN
                       Sales ON SaleProducts.SaleID = Sales.SaleID
                 WHERE  (SaleProducts.ItemID IS NOT NULL) AND (Sales.ProcessedOn IS NOT NULL) AND (SaleProducts.IsReturn = 1)
-end union

) AS viwOwnerEvents2) AS t1) AS ItemOwners_1 ON 
   Items.ItemID = ItemOwners_1.ItemID
WHERE  (ItemOwners_1.OwnerID = 930)

This query as is shows disk I/O of 8 MB result (in supratimas).

If I take out the union it uses only 600KB, that's more than a 90% improvement!

The time for running the query is less than 1 second in either case, the difference is for queries based on top of this (and also the full query has a time difference too). Also the estimated rows are off (1 vs 35 etc..)

All columns mentioned here are indexed, the query advisor shows a 13% possible improvement if i add about 25 new indexes and statistics.

Does that mean all indexes are lost the minute I union? Or am I doing something wrong? Is there a way around it?

I am able to reproduce the issue with less code, here's an even simpler query:

--QUERY 1
SELECT * 
FROM  (SELECT *, 
          Row_number() 
            OVER ( 
              partition BY itemid 
              ORDER BY origdate DESC)AS RowNum 
   FROM   items)t1 
WHERE  origdate < Getdate() 
   AND orighand = 930 

--QUERY 2
SELECT *, 
   Row_number() 
     OVER ( 
       partition BY itemid 
       ORDER BY origdate DESC)AS RowNum 
FROM   items 
WHERE  origdate < Getdate() 
   AND orighand = 930

The difference is in the nesting. The first query which has one level of nesting uses 18MB of I/O, whereas the second uses 7KB.

The nesting is a must, since as explained the rownumber actually runs over a much longer union query.

Does the fact that it's using 17MB show that my indexing is wrong? Or it's somehow not bubbling up correctly? (and perhaps also locking more rows than necessary)?

Or am I simply barking up the wrong tree? Should I only be concerned with the execution time of the end query (which is much more complex, around 7 seconds, 50 MB I/O).

Here is the execution plan: http://pastebin.com/1h9KWVr9

Here's the slow query (which is actually based on other queries a few levels deep). In the end, this is the query that I'm trying to optimize.

select * from localitemexts(0,getdate()) where holderid=930 or ownerid=930

and here's the execution plan (and yes it's huge) http://www.writeurl.com/text/ay83rltx0wqbtwjgh8xe/0ws9zeqd4yghn0d628b1

UPDATE 3
OK guys. So I made some homework and additional tests. First off, moving the where into the union made no difference in the disk IO, but I'm leaving it there for now, for testing.

Below (query 4) is a query closer to the actual use case. This has already 1.16MB IO, when i filter on the RowNum and HolderID (both coming from the union). But when I filter on the ProductID which is a standard column in the Items table, it uses only 250 KB. (the actual difference in result rows are 6 vs 14, not enough to explain the difference in IO)

So basically my question is like this:

As i extend the query with more joins and unions, the query becomes slower and slower, the question is where do I start debugging? should I grow the query till i see execution times of 3 seconds plus? and then start debugging? or is high IO supposed to raise a red flag right away that something is wrong with my indexing or query set up and I need to start fixing it right there? even though it runs i 0:00 seconds?

--QUERY 4
select * from items inner join (SELECT *, 
      Row_number() 
        OVER ( 
          partition BY itemid 
          ORDER BY origdate DESC)AS RowNum 
FROM   (select itemid,origdate,orighand as HolderID from items where origdate<getdate()
UNION ALL
SELECT       itemid,PurchasedOn,VendorID 
FROM            PurchaseItems inner join Purchases on PurchaseItems.PurchaseID=Purchases.PurchaseID 
where delivered=1 and IsReturn=1 and PurchasedOn<getdate())viwHolderEvents )       as  Holders on items.itemid=holders.itemid
WHERE productid=100--  rownum=1 and  HolderID  = 930

As a test for outer apply I tried the following, the IO was worse, 18MB.

--QUERY 5 (outer apply)
select * from items outer apply(SELECT top 1* 
FROM   ( 
select itemid,origdate,orighand as HolderID from items where    origdate<getdate()
UNION ALL
SELECT       itemid,PurchasedOn,VendorID 
FROM            PurchaseItems inner join Purchases on PurchaseItems.PurchaseID=Purchases.PurchaseID 
where delivered=1 and IsReturn=1 and PurchasedOn<getdate())viwHolderEvents  where viwHolderEvents.itemid=items.ItemID   order  by origdate desc ) t1 where holderid=930

Best Answer

Try pushing the predicates as close to the base tables as possible:

SELECT * 
FROM  (SELECT items.*, Row_number() OVER (partition BY itemid 
                                          ORDER BY origdate DESC
                                         )AS RowNum 
      FROM   items 
      WHERE  origdate < Getdate() 
        AND orighand = 930
      ) t1

Do the same for all legs of the union. If the predicates contain references to an an outer table as in:

SELECT ...
FROM T1
JOIN (
    SELECT ...
    FROM T2
    WHERE T2.x = 'WHATEVER'
) T3
    ON T3.y = T1.Y

You can use a LATERAL JOIN (CROSS APPLY in SQL-server) to be able to push this predicate inside the derived table:

SELECT ...
FROM T1
CROSS JOIN LATERAL (
    SELECT ...
    FROM T2
    WHERE T2.x = 'WHATEVER'
      AND T2.y = T1.Y
) T3

Related Solutions

SQL Server Execution Plan – SHOWPLAN vs ‘Include Execution Plan’ Warnings for the Same Query

This:

SET SHOWPLAN_XML ON;
GO
SELECT * FROM sys.objects;
GO

Is equivalent to pressing Display Estimated Execution Plan on the toolbar (or hitting Ctrl + L). You'll notice that no rows are returned from the query, like there is when you use Include Actual Execution Plan (Ctrl + M).

The spill warning is only a runtime warning. There is no way that SQL Server can know, when displaying the estimated plan, that a spill will happen at runtime. This is because a spill is caused by factors that might only be present during certain invocations of the query (for example, when there is memory pressure). The estimated plan knows roughly how much memory it's going to ask for, but it can't know until execution that it isn't going to get it.

As an aside, may I recommend* our free tool, SQL Sentry Plan Explorer? I think it provides much more obvious information than Management Studio. I recently wrote a lengthy blog post that can act as a tutorial, and Jonathan Kehayias has a great PluralSight course on it as well.

_{* Disclaimer: I work for SQL Sentry.}

Mysql – Slower running Query with better Filter

There are 3 steps in SELECTing from a PARTITIONed table.

"Partition pruning" -- deciding which partition(s) might have the data
Running the SELECT on each of those partitions
Combining the results.

Your first query could do no pruning since nothing in the WHERE clause limited which partitions to look at.

A quoted number will convert to an INT or BIGINT without trouble. (The other direction causes trouble: WHERE varchar_col = 123.)

The one index you have -- (itemid,clock) -- works even better without partitioning. Do you have other queries for which you think partitioning is necessary?

You have no explicit PRIMARY KEY on that InnoDB table. That generally hurts performance.

As for your 0sec vs 40sec, please provide EXPLAIN FORMAT=JSON SELECT ... for each of the queries. There may be extra clues.

Best Answer

Related Solutions

SQL Server Execution Plan – SHOWPLAN vs ‘Include Execution Plan’ Warnings for the Same Query

Mysql – Slower running Query with better Filter

Related Question