With partitioned table, How to use hints to group each partition separately

database-tuningoraclepartitioning

Suppose I have the following table
data(partitioned_key_index, some_dummy_measure)

Assume that partitions are of an equal large size. with Oracle 11g.

The end result should be like this

select partitioned_key_index, sum(some_dummy_measure) 
  from data group by partitioned_key_index 

Each partition will be grouped independently, the optimizer should be
clever enough to come up with a plan in which each partition will be aggregated then a simple 'union all' to get the desired output.

What I want to do is something close to this

 select 1 as partitioned_key_index, sum(some_dummy_measure) 
  from data where partitioned_key_index = 1  
 Union All
 select 2 as partitioned_key_index, sum(some_dummy_measure) 
  from data where partitioned_key_index = 2
 Union All
  .
  .
  .
 select i as partitioned_key_index, sum(some_dummy_measure) 
  from data where partitioned_key_index = i

My intuition with the above method is to to serialize the hash group operation
thus each partition will be moved from the disk to the buffer cache with the hope of not spilling into disk for the group by operation.

Any ideas how to tune this kind of queries?

Best Answer

The database does this by default, no hints/tuning needed.

create table data(partitioned_key_index number, some_dummy_measure number)
  partition by list (partitioned_key_index)
(
     partition p1 values (1),
     partition p2 values (2),
     partition p3 values (3),
     partition p4 values (4),
     partition p5 values (5)
);

insert into data with g as (select * from dual connect by level <= 1000) 
select mod(rownum, 5) + 1, rownum from g,g where rownum <= 500000;
commit;

Then run the query:

alter session set statistics_level=all;

select partitioned_key_index, sum(some_dummy_measure) from data 
  group by partitioned_key_index;


PARTITIONED_KEY_INDEX SUM(SOME_DUMMY_MEASURE)
--------------------- -----------------------
                    1             25000250000
                    2             24999850000
                    3             24999950000
                    4             25000050000
                    5             25000150000

Check what happened:

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------
SQL_ID  537qapda9hdy4, child number 0
-------------------------------------
select partitioned_key_index, sum(some_dummy_measure) from data group
by partitioned_key_index

Plan hash value: 3405952922

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |00:00:00.14 |    1065 |       |       |          |
|   1 |  PARTITION LIST ALL |      |      1 |    585K|      5 |00:00:00.14 |    1065 |       |       |          |
|   2 |   HASH GROUP BY     |      |      5 |    585K|      5 |00:00:00.14 |    1065 |    34M|  6473K|  738K (0)|
|   3 |    TABLE ACCESS FULL| DATA |      5 |    585K|    500K|00:00:00.06 |    1065 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

Operations below PARTITION LIST ALL were done for all partitions. We have 5 partitions, including the GROUP BY. As you can see from the Starts column, HASH GROUP BY was really performed 5 times.

This is how it looks like, when GROUP BY is performed for the whole table at once:

select (partitioned_key_index + 0), sum(some_dummy_measure) from data
    group by (partitioned_key_index + 0);

(PARTITIONED_KEY_INDEX+0) SUM(SOME_DUMMY_MEASURE)
------------------------- -----------------------
                        1             25000250000
                        2             24999850000
                        5             25000150000
                        4             25000050000
                        3             24999950000

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  2xxf42mtp53sc, child number 0
-------------------------------------
select (partitioned_key_index + 0), sum(some_dummy_measure) from data
group by (partitioned_key_index + 0)

Plan hash value: 3651737839

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |00:00:00.24 |    1065 |       |       |          |
|   1 |  HASH GROUP BY      |      |      1 |    585K|      5 |00:00:00.24 |    1065 |    34M|  6473K| 4574K (0)|
|   2 |   PARTITION LIST ALL|      |      1 |    585K|    500K|00:00:00.14 |    1065 |       |       |          |
|   3 |    TABLE ACCESS FULL| DATA |      5 |    585K|    500K|00:00:00.06 |    1065 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


20 rows selected.

HASH GROUP BY was performend only once, for the whole amount of data, after it was collected from all partitions.