I read a lot about indexes/unions but can't get to the bottom of this.
I have here a (oversimplified for this question) query:
SELECT Items.ItemID AS ID, Items.Comment, ItemOwners_1.OwnerID
FROM Items INNER JOIN
(SELECT EventDate, ItemID, OwnerID
FROM (SELECT EventDate, ItemID, OwnerID
FROM (SELECT Sales.ProcessedOn AS EventDate, SaleProducts.ItemID, Addresses.PersonID AS OwnerID
FROM SaleProducts INNER JOIN
Sales ON SaleProducts.SaleID = Sales.SaleID INNER JOIN
Addresses ON Sales.ShipAddressID = Addresses.AddressID
--start union
UNION ALL
SELECT Sales.ProcessedOn AS EventDate, SaleProducts.ItemID, - 1 AS OwnerID
FROM SaleProducts INNER JOIN
Sales ON SaleProducts.SaleID = Sales.SaleID
WHERE (SaleProducts.ItemID IS NOT NULL) AND (Sales.ProcessedOn IS NOT NULL) AND (SaleProducts.IsReturn = 1)
-end union
) AS viwOwnerEvents2) AS t1) AS ItemOwners_1 ON
Items.ItemID = ItemOwners_1.ItemID
WHERE (ItemOwners_1.OwnerID = 930)
This query as is shows disk I/O of 8 MB result (in supratimas).
If I take out the union it uses only 600KB, that's more than a 90% improvement!
The time for running the query is less than 1 second in either case, the difference is for queries based on top of this (and also the full query has a time difference too). Also the estimated rows are off (1 vs 35 etc..)
All columns mentioned here are indexed, the query advisor shows a 13% possible improvement if i add about 25 new indexes and statistics.
Does that mean all indexes are lost the minute I union? Or am I doing something wrong? Is there a way around it?
I am able to reproduce the issue with less code, here's an even simpler query:
--QUERY 1
SELECT *
FROM (SELECT *,
Row_number()
OVER (
partition BY itemid
ORDER BY origdate DESC)AS RowNum
FROM items)t1
WHERE origdate < Getdate()
AND orighand = 930
--QUERY 2
SELECT *,
Row_number()
OVER (
partition BY itemid
ORDER BY origdate DESC)AS RowNum
FROM items
WHERE origdate < Getdate()
AND orighand = 930
The difference is in the nesting. The first query which has one level of nesting uses 18MB of I/O, whereas the second uses 7KB.
The nesting is a must, since as explained the rownumber
actually runs over a much longer union query.
Does the fact that it's using 17MB show that my indexing is wrong? Or it's somehow not bubbling up correctly? (and perhaps also locking more rows than necessary)?
Or am I simply barking up the wrong tree? Should I only be concerned with the execution time of the end query (which is much more complex, around 7 seconds, 50 MB I/O).
Here is the execution plan: http://pastebin.com/1h9KWVr9
Here's the slow query (which is actually based on other queries a few levels deep). In the end, this is the query that I'm trying to optimize.
select * from localitemexts(0,getdate()) where holderid=930 or ownerid=930
and here's the execution plan (and yes it's huge) http://www.writeurl.com/text/ay83rltx0wqbtwjgh8xe/0ws9zeqd4yghn0d628b1
UPDATE 3
OK guys. So I made some homework and additional tests. First off, moving the where into the union made no difference in the disk IO, but I'm leaving it there for now, for testing.
Below (query 4) is a query closer to the actual use case. This has already 1.16MB IO, when i filter on the RowNum and HolderID (both coming from the union). But when I filter on the ProductID which is a standard column in the Items table, it uses only 250 KB. (the actual difference in result rows are 6 vs 14, not enough to explain the difference in IO)
So basically my question is like this:
As i extend the query with more joins and unions, the query becomes slower and slower, the question is where do I start debugging? should I grow the query till i see execution times of 3 seconds plus? and then start debugging? or is high IO supposed to raise a red flag right away that something is wrong with my indexing or query set up and I need to start fixing it right there? even though it runs i 0:00 seconds?
--QUERY 4
select * from items inner join (SELECT *,
Row_number()
OVER (
partition BY itemid
ORDER BY origdate DESC)AS RowNum
FROM (select itemid,origdate,orighand as HolderID from items where origdate<getdate()
UNION ALL
SELECT itemid,PurchasedOn,VendorID
FROM PurchaseItems inner join Purchases on PurchaseItems.PurchaseID=Purchases.PurchaseID
where delivered=1 and IsReturn=1 and PurchasedOn<getdate())viwHolderEvents ) as Holders on items.itemid=holders.itemid
WHERE productid=100-- rownum=1 and HolderID = 930
As a test for outer apply I tried the following, the IO was worse, 18MB.
--QUERY 5 (outer apply)
select * from items outer apply(SELECT top 1*
FROM (
select itemid,origdate,orighand as HolderID from items where origdate<getdate()
UNION ALL
SELECT itemid,PurchasedOn,VendorID
FROM PurchaseItems inner join Purchases on PurchaseItems.PurchaseID=Purchases.PurchaseID
where delivered=1 and IsReturn=1 and PurchasedOn<getdate())viwHolderEvents where viwHolderEvents.itemid=items.ItemID order by origdate desc ) t1 where holderid=930
Best Answer
Try pushing the predicates as close to the base tables as possible:
Do the same for all legs of the union. If the predicates contain references to an an outer table as in:
You can use a
LATERAL JOIN
(CROSS APPLY in SQL-server) to be able to push this predicate inside the derived table: