The optimizer always tries to reduce the amount of data as quickly as possible. If not, your statistics might not be good.
Your plan 1 shows that less rows are processed this is good. The optimizer was able to reduce the amount of data more quickly. The numbers might not be exactly true but it gives you an idea based on the optimizer statistics.
Plan 1:
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 79 | 7 (15)| 00:00:01 | | |
| 1 | HASH GROUP BY | | 1 | 79 | 7 (15)| 00:00:01 | | |
| 2 | NESTED LOOPS | | | | | | | |
| 3 | NESTED LOOPS | | 1 | 79 | 6 (0)| 00:00:01 | | |
| 4 | NESTED LOOPS | | 1 | 63 | 5 (0)| 00:00:01 | | |
| 5 | TABLE ACCESS BY GLOBAL INDEX ROWID| t1 | 1 | 22 | 3 (0)| 00:00:01 | ROWID | ROWID |
|* 6 | INDEX RANGE SCAN | t1pk | 1 | | 2 (0)| 00:00:01 | | |
| 7 | PARTITION RANGE ITERATOR | | 1 | 41 | 2 (0)| 00:00:01 | KEY | KEY |
|* 8 | TABLE ACCESS BY LOCAL INDEX ROWID| t3 | 1 | 41 | 2 (0)| 00:00:01 | KEY | KEY |
|* 9 | INDEX RANGE SCAN | t3fk | 1 | | 1 (0)| 00:00:01 | KEY | KEY |
|* 10 | INDEX UNIQUE SCAN | t2pk | 1 | | 0 (0)| 00:00:01 | | |
|* 11 | TABLE ACCESS BY GLOBAL INDEX ROWID | t2 | 1 | 16 | 1 (0)| 00:00:01 | ROWID | ROWID |
---------------------------------------------------------------------------------------------------------------
It shows that Oracle did a HASH GROUP BY while plan 2 did a SORT GROUP BY.
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 311 | 7 (15)| 00:00:01 | | |
| 1 | NESTED LOOPS | | | | | | | |
| 2 | NESTED LOOPS | | 1 | 311 | 7 (15)| 00:00:01 | | |
| 3 | NESTED LOOPS | | 1 | 253 | 6 (17)| 00:00:01 | | |
| 4 | TABLE ACCESS BY GLOBAL INDEX ROWID | t1 | 1 | 164 | 3 (0)| 00:00:01 | ROWID | ROWID |
|* 5 | INDEX RANGE SCAN | t1pk | 1 | | 2 (0)| 00:00:01 | | |
| 6 | VIEW PUSHED PREDICATE | | 1 | 89 | 3 (34)| 00:00:01 | | |
| 7 | SORT GROUP BY | | 1 | 41 | 3 (34)| 00:00:01 | | |
|* 8 | FILTER | | | | | | | |
| 9 | PARTITION RANGE SINGLE | | 1 | 41 | 2 (0)| 00:00:01 | KEY | KEY |
|* 10 | TABLE ACCESS BY LOCAL INDEX ROWID| t3 | 1 | 41 | 2 (0)| 00:00:01 | KEY | KEY |
|* 11 | INDEX RANGE SCAN | t3fk | 1 | | 1 (0)| 00:00:01 | KEY | KEY |
|* 12 | INDEX UNIQUE SCAN | t2pk | 1 | | 0 (0)| 00:00:01 | | |
|* 13 | TABLE ACCESS BY GLOBAL INDEX ROWID | t2 | 1 | 58 | 1 (0)| 00:00:01 | ROWID | ROWID |
-----------------------------------------------------------------------------------------------------------------
Conclusion: In most cases I would prefer a HASH GROUP BY. So in plan 1 Oracle does a better the GROUP BY much later with much less data. 2 reasons why plan 1 is better.
I assume that this is not the only SQL statement which will be using this join constellation. You will have to decide for each SQL which way is the best.
If you have a SQL similar to the following, the with clause can be much better:
with t3g as ( -- as in 't3 grouped'
select id2, count(1) t3count
from t3
where id = :1 -- new line !!!!!!!!
group by id2
)
select t1.id t1id, t2id, t3count
from t1 join t2 on (t1.p = t2.p and t1.id = t2.id1)
join t3g on (t2.p = t3g.p and t2.id = id2)
The following works but there's probably some more clever version. Explanation of the query's logic:
We first find how many "resets" have been done up to and including the current row by counting the not nulls of the reset_val
column, so we can separate the rows into subgroups.
We also use another window function LAST_VALUE()
with IGNORE NULLS
, so we can find the last reset_value
.
Note that both these window functions COUNT()
and LAST_VALUE()
have an ORDER BY
, therefore the default window ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. Omitted in the query, to make the code more clear.
Assuming that the val
is not nullable, the other window function could also be shortened, from:
COALESCE(SUM(val) OVER
(PARTITION BY group_id, reset_count
ORDER BY order_val
ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING), 0)
(avoiding the COALESCE()
as well) to:
SUM(val) OVER
(PARTITION BY group_id, reset_count
ORDER BY order_val)
- val
Finally, in the second cte, we use the above found subgroups (using PARTITION BY group_id, reset_count
) to find the cumulative sums.
WITH x AS
( SELECT temp.*,
COUNT(reset_val) OVER
(PARTITION BY group_id
ORDER BY order_val)
AS reset_count,
COALESCE(LAST_VALUE(reset_val IGNORE NULLS) OVER
(PARTITION BY group_id
ORDER BY order_val), 0)
AS reset_value
FROM temp
) ,
y AS
( SELECT x.*,
COALESCE(SUM(val) OVER
(PARTITION BY group_id, reset_count
ORDER BY order_val
ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 PRECEDING), 0)
+ reset_value AS cumsum
FROM x
)
SELECT *
FROM y ;
Test at SQLfiddle.
Another variation, based on @Chris' recursive answer. (slightly improved, works with non-consecutive order_val
, avoids the final GROUP BY
).
Also works in case the first row of a group has a reset_val
:
WITH row_nums AS
( SELECT id, group_id, order_val, reset_val, val,
ROW_NUMBER() OVER (PARTITION BY group_id
ORDER BY order_val)
AS rn
FROM temp
) ,
updated_temp (id, group_id, order_val, reset_val, val, rn, cumsum) AS
( SELECT id, group_id, order_val, reset_val, val, rn,
COALESCE(reset_val, 0)
FROM row_nums
WHERE rn = 1
UNION ALL
SELECT curr.id, curr.group_id, curr.order_val, curr.reset_val, curr.val, curr.rn,
COALESCE(curr.reset_val, prev.val + prev.cumsum)
FROM row_nums curr
JOIN updated_temp prev
ON curr.rn-1 = prev.rn
AND curr.group_id = prev.group_id
)
SELECT id, group_id, order_val, reset_val, val, cumsum
FROM updated_temp
ORDER BY group_id, order_val ;
Test at SQLfiddle-2.
One more variation, using the older (proprietary) CONNECT BY
syntax for recursive queries. More compact but I find it more difficult to write and read than the CTE version:
WITH row_nums AS
( SELECT id, group_id, order_val, reset_val, val,
ROW_NUMBER() OVER (PARTITION BY group_id
ORDER BY order_val)
AS rn,
COALESCE(reset_val, 0) AS cumsum
FROM temp
)
SELECT id, group_id, order_val, reset_val, val, rn,
COALESCE(reset_val, PRIOR val + PRIOR cumsum, 0) AS cumsum
FROM row_nums
START WITH rn = 1 OR reset_val IS NOT NULL
CONNECT BY rn-1 = PRIOR rn
AND group_id = PRIOR group_id
AND reset_val IS NULL
ORDER BY group_id, order_val ;
Tested at SQLfiddle-3.
Best Answer
You should benchmark it using autotrace (http://betteratoracle.com/posts/10-using-autotrace) for example.
If you expect millions of rows to be returned, then I do not think indexing will help at all. Simple full table scan and hash joining is the most efficient way of executing this kind of query (https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9422487749968).
Putting the least selective column first and using key compression can decrease the index size and reduce IO and CPU consumption (http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8806017524595).
The following index should be quite sufficient if you fetch only few rows: