Oracle – Group By and Retrieve Other Fields with Performance Comparison

group byoraclequery-performancesumwindow functions

I'm a novice, trying to find the best way to sum up my data. Oracle 19c.


I have a WORKORDER table that has work orders (ISTASK=0) and tasks (ISTASK=1).

  • Tasks are children of work orders.
  • Work orders and tasks are grouped by WOGROUP.

Costs are split out into four columns:

  • actlabcost (actual labour cost)
  • actmatcost (actual material cost)
  • acttoolcost (actual tool cost)
  • actservcost (actual service cost)

The cost columns are not nullable. So we don't need to worry about converting nulls to zeros to avoid doing math on nulls.


Select 'WO1361' as WONUM, 'WO1361' as WOGROUP, 0 as ISTASK, 0 as ACTLABCOST, 0 as ACTMATCOST, 0 as ACTTOOLCOST, 0 as ACTSERVCOST, '167457977' as OTHER_WO_COLUMNS FROM DUAL
UNION ALL
Select 'WO1362' as WONUM, 'WO1362' as WOGROUP, 0 as ISTASK, 0 as ACTLABCOST, 0 as ACTMATCOST, 0 as ACTTOOLCOST, 0 as ACTSERVCOST, '167458280' as OTHER_WO_COLUMNS FROM DUAL
UNION ALL
Select 'WO1363' as WONUM, 'WO1363' as WOGROUP, 0 as ISTASK, 270.14 as ACTLABCOST, 0 as ACTMATCOST, 0 as ACTTOOLCOST, 0 as ACTSERVCOST, '167483430' as OTHER_WO_COLUMNS FROM DUAL
UNION ALL
Select 'WO1367' as WONUM, 'WO1363' as WOGROUP, 1 as ISTASK, 540.27 as ACTLABCOST, 0 as ACTMATCOST, 0 as ACTTOOLCOST, 0 as ACTSERVCOST, '167482806' as OTHER_WO_COLUMNS FROM DUAL
UNION ALL
Select 'WO1370' as WONUM, 'WO1363' as WOGROUP, 1 as ISTASK, 202.6 as ACTLABCOST, 0 as ACTMATCOST, 0 as ACTTOOLCOST, 0 as ACTSERVCOST, '167483431' as OTHER_WO_COLUMNS FROM DUAL
UNION ALL
Select 'WO1364' as WONUM, 'WO1364' as WOGROUP, 0 as ISTASK, 88.86 as ACTLABCOST, 0 as ACTMATCOST, 0 as ACTTOOLCOST, 0 as ACTSERVCOST, '167459454' as OTHER_WO_COLUMNS FROM DUAL
UNION ALL
Select 'WO1366' as WONUM, 'WO1364' as WOGROUP, 1 as ISTASK, 33.77 as ACTLABCOST, 0 as ACTMATCOST, 0 as ACTTOOLCOST, 0 as ACTSERVCOST, '167458946' as OTHER_WO_COLUMNS FROM DUAL
UNION ALL
Select 'WO1365' as WONUM, 'WO1365' as WOGROUP, 0 as ISTASK, 67.53 as ACTLABCOST, 0 as ACTMATCOST, 0 as ACTTOOLCOST, 0 as ACTSERVCOST, '167459331' as OTHER_WO_COLUMNS FROM DUAL
UNION ALL
Select 'WO1368' as WONUM, 'WO1368' as WOGROUP, 0 as ISTASK, 236.37 as ACTLABCOST, 0 as ACTMATCOST, 0 as ACTTOOLCOST, 0 as ACTSERVCOST, '167461627' as OTHER_WO_COLUMNS FROM DUAL

WONUM  WOGROUP     ISTASK ACTLABCOST ACTMATCOST ACTTOOLCOST ACTSERVCOST OTHER_WO_COLUMNS
------ ------- ---------- ---------- ---------- ----------- ----------- ----------------
WO1361 WO1361           0          0          0           0           0 167457977       
WO1362 WO1362           0          0          0           0           0 167458280       
WO1363 WO1363           0     270.14          0           0           0 167483430       
WO1367 WO1363           1     540.27          0           0           0 167482806       
WO1370 WO1363           1      202.6          0           0           0 167483431       
WO1364 WO1364           0      88.86          0           0           0 167459454       
WO1366 WO1364           1      33.77          0           0           0 167458946       
WO1365 WO1365           0      67.53          0           0           0 167459331       
WO1368 WO1368           0     236.37          0           0           0 167461627       


Notice rows 3-5 are in WOGROUP #WO1363. And rows 6-7 are in WOGROUP #WO1364.
                                                                             

Problem:

I want to sum the work order costs by WOGROUP (including task costs), but I don't want to show the task rows in the resultset. In other words, I want to roll-up the task costs to their parent work orders.

For the work order rows, I also want to include other columns that weren’t grouped (ie. OTHER_WO_COLUMNS).


I've found a couple of ways of doing it.

Option #1: (GROUP BY, JOIN, and SUM)

The query does a GROUP BY (SUM) in a subquery to get the total work order costs. Then it selects the work orders (excluding tasks) and joins to the subquery to bring in the total costs.

--The suffix "_ti" stands for "tasks included".
select
    a.wonum,
    a.istask,
    b.actlabcost_ti,
    b.actmatcost_ti,
    b.actservcost_ti,
    b.acttoolcost_ti,
    b.acttotalcost_ti,
    other_wo_columns
from
    cte a
left join
    (
    select 
        wogroup as wonum, 
        sum(actlabcost)  as actlabcost_ti,
        sum(actmatcost)  as actmatcost_ti,
        sum(actservcost) as actservcost_ti,
        sum(acttoolcost) as acttoolcost_ti,
        sum(actlabcost + actmatcost + actservcost + acttoolcost) as acttotalcost_ti
    from 
        cte 
    group by 
        wogroup
    ) b
    on a.wonum = b.wonum
where
    istask = 0

WONUM      ISTASK ACTLABCOST_TI ACTMATCOST_TI ACTSERVCOST_TI ACTTOOLCOST_TI ACTTOTALCOST_TI OTHER_WO_COLUMNS
------ ---------- ------------- ------------- -------------- -------------- --------------- ----------------
WO1361          0             0             0              0              0               0 167457977       
WO1362          0             0             0              0              0               0 167458280       
WO1363          0       1013.01             0              0              0         1013.01 167483430       
WO1364          0        122.63             0              0              0          122.63 167459454       
WO1365          0         67.53             0              0              0           67.53 167459331       
WO1368          0        236.37             0              0              0          236.37 167461627  

I ran the query on a full production table (WORKORDER table has 4,500 rows) and got this explain plan:

Plan hash value: 1879239811
 
---------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |  9841 |   586K|   622   (2)| 00:00:01 |
|   1 |  SORT GROUP BY      |           |  9841 |   586K|   622   (2)| 00:00:01 |
|*  2 |   HASH JOIN OUTER   |           |  9841 |   586K|   620   (1)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| WORKORDER |  4609 |   184K|   310   (1)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| WORKORDER |  9841 |   192K|   310   (1)| 00:00:01 |
---------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("A"."WONUM"="WOGROUP"(+))
   3 - filter("A"."ISTASK"=0)

The estimated cost is 622.


Option #2: (SUM analytical function)

I found a way to do it with the SUM analytical function. And I wrapped the SUM analytic function query in an outer query that hides the task rows.

--The suffix "_ti" stands for "tasks included".  
select
    wonum,
    istask,
    actlabcost_ti,
    actmatcost_ti,
    acttoolcost_ti,
    actservcost_ti,
    acttotalcost_ti,
    other_wo_columns
from
    (
    select
        wogroup as wonum,
        istask,
        sum(actlabcost ) over (partition by wogroup) as actlabcost_ti,
        sum(actmatcost ) over (partition by wogroup) as actmatcost_ti,
        sum(acttoolcost) over (partition by wogroup) as acttoolcost_ti,
        sum(actservcost) over (partition by wogroup) as actservcost_ti,
        sum(actlabcost + actmatcost + acttoolcost + actservcost) over (partition by wogroup) as acttotalcost_ti,
        other_wo_columns
    from
        cte
    )
where
    istask=0

WONUM      ISTASK ACTLABCOST_TI ACTMATCOST_TI ACTTOOLCOST_TI ACTSERVCOST_TI ACTTOTALCOST_TI OTHER_WO_COLUMNS
------ ---------- ------------- ------------- -------------- -------------- --------------- ----------------
WO1361          0             0             0              0              0               0 167457977       
WO1362          0             0             0              0              0               0 167458280       
WO1363          0       1013.01             0              0              0         1013.01 167483430       
WO1364          0        122.63             0              0              0          122.63 167459454       
WO1365          0         67.53             0              0              0           67.53 167459331       
WO1368          0        236.37             0              0              0          236.37 167461627    

I ran this query in production too, and got this explain plan:

Plan hash value: 2003557620
 
---------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |  9841 |  1749K|   312   (2)| 00:00:01 |
|*  1 |  VIEW               |           |  9841 |  1749K|   312   (2)| 00:00:01 |
|   2 |   WINDOW SORT       |           |  9841 |   394K|   312   (2)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| WORKORDER |  9841 |   394K|   310   (1)| 00:00:01 |
---------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("ISTASK"=0)

The estimated cost is 312, which is roughly half of the first query.

I think it's faster because it only does one full table scan (the other query did two full scans).


Question:

What's the best/fastest way to sum up this data?

As mentioned, I've noticed that option #2 is faster than #1. But to be honest, #2 seems a bit backwards to me. I got some pretty harsh criticism for that approach over on Stack Overflow, so I'm guessing it's not a good way to structure a query.

Best Answer

My clear preference is on option #2. IMHO it is pretty straight. You aggregate per partition, then only keep the rows you are interested in. As the calculation of window functions without framing clause (BETWEEN) is very well optimized it's not that you do a lot and throw it away afterwards.

Nevertheless, some feedback re Option #1:

  • Why did you take a left join (not an inner join?)

  • Instead of the self-join you could fetch the other_wo_columns using selective aggregates:

    MAX(CASE WHEN istask = 0 THEN other_wo_column END)

    If there is only one row with istask = 0 per group, it will just return the value of that row. I used max because it works on all data types — like min — but not like sum.

    If you have many columns the code becomes a little messy, of course.

    It is quite possible that this is the fastest approach, but probably not by a big margin compared to #2.

    More about this technique:

Finally, about COST

Don't use COST as a metric to compare different query approaches. The COST value is used to determine the best possible execution plan among the various possibilities for the very same query in the very same database with the very same data. The calculation of the COST value is highly optimized for this particular purpose — i.e., everything that isn't needed to serve this particular purpose is just skipped for performance reasons. Consequently, using for something different has high chances of giving wrong results.

It's just not meant to be used to compare different queries.