With partitioned table, How to use hints to group each partition separately

database-tuningoraclepartitioning

Suppose I have the following table
data(partitioned_key_index, some_dummy_measure)

Assume that partitions are of an equal large size. with Oracle 11g.

The end result should be like this

select partitioned_key_index, sum(some_dummy_measure) 
  from data group by partitioned_key_index

Each partition will be grouped independently, the optimizer should be
clever enough to come up with a plan in which each partition will be aggregated then a simple 'union all' to get the desired output.

What I want to do is something close to this

 select 1 as partitioned_key_index, sum(some_dummy_measure) 
  from data where partitioned_key_index = 1  
 Union All
 select 2 as partitioned_key_index, sum(some_dummy_measure) 
  from data where partitioned_key_index = 2
 Union All
  .
  .
  .
 select i as partitioned_key_index, sum(some_dummy_measure) 
  from data where partitioned_key_index = i

My intuition with the above method is to to serialize the hash group operation
thus each partition will be moved from the disk to the buffer cache with the hope of not spilling into disk for the group by operation.

Any ideas how to tune this kind of queries?

Best Answer

The database does this by default, no hints/tuning needed.

create table data(partitioned_key_index number, some_dummy_measure number)
  partition by list (partitioned_key_index)
(
     partition p1 values (1),
     partition p2 values (2),
     partition p3 values (3),
     partition p4 values (4),
     partition p5 values (5)
);

insert into data with g as (select * from dual connect by level <= 1000) 
select mod(rownum, 5) + 1, rownum from g,g where rownum <= 500000;
commit;

Then run the query:

alter session set statistics_level=all;

select partitioned_key_index, sum(some_dummy_measure) from data 
  group by partitioned_key_index;


PARTITIONED_KEY_INDEX SUM(SOME_DUMMY_MEASURE)
--------------------- -----------------------
                    1             25000250000
                    2             24999850000
                    3             24999950000
                    4             25000050000
                    5             25000150000

Check what happened:

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------
SQL_ID  537qapda9hdy4, child number 0
-------------------------------------
select partitioned_key_index, sum(some_dummy_measure) from data group
by partitioned_key_index

Plan hash value: 3405952922

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |00:00:00.14 |    1065 |       |       |          |
|   1 |  PARTITION LIST ALL |      |      1 |    585K|      5 |00:00:00.14 |    1065 |       |       |          |
|   2 |   HASH GROUP BY     |      |      5 |    585K|      5 |00:00:00.14 |    1065 |    34M|  6473K|  738K (0)|
|   3 |    TABLE ACCESS FULL| DATA |      5 |    585K|    500K|00:00:00.06 |    1065 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

Operations below PARTITION LIST ALL were done for all partitions. We have 5 partitions, including the GROUP BY. As you can see from the Starts column, HASH GROUP BY was really performed 5 times.

This is how it looks like, when GROUP BY is performed for the whole table at once:

select (partitioned_key_index + 0), sum(some_dummy_measure) from data
    group by (partitioned_key_index + 0);

(PARTITIONED_KEY_INDEX+0) SUM(SOME_DUMMY_MEASURE)
------------------------- -----------------------
                        1             25000250000
                        2             24999850000
                        5             25000150000
                        4             25000050000
                        3             24999950000

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  2xxf42mtp53sc, child number 0
-------------------------------------
select (partitioned_key_index + 0), sum(some_dummy_measure) from data
group by (partitioned_key_index + 0)

Plan hash value: 3651737839

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |00:00:00.24 |    1065 |       |       |          |
|   1 |  HASH GROUP BY      |      |      1 |    585K|      5 |00:00:00.24 |    1065 |    34M|  6473K| 4574K (0)|
|   2 |   PARTITION LIST ALL|      |      1 |    585K|    500K|00:00:00.14 |    1065 |       |       |          |
|   3 |    TABLE ACCESS FULL| DATA |      5 |    585K|    500K|00:00:00.06 |    1065 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


20 rows selected.

HASH GROUP BY was performend only once, for the whole amount of data, after it was collected from all partitions.

Related Solutions

How to horizontally partition an oracle database table, and should I

To answer your second question first: yes you should partition. Oracle's query optimizer has a feature called partition elimination, which will check the predicate for the partition and only execute the SQL on the appropriate partitions.

Partitioning also leaves all the data in one space. Conceptually, think of it as many tables of identical structure, with an implicit UNION ALL between them if you were to do a SELECT from the entire table. Except "under the hood" Oracle sorts the actual rows into the right "table" based on the criteria you specify. Any rows that come in that don't match any of the criteria, go into what's known as the "default" partition.

For what you want to do, a "range partition" might be a good approach (so you can add more tenants later), e.g.:

create table transaction (id, tenant_id, a, b, c, d)
partition by range(tenant_id)
partition p_tenant1 values less than (2) tablespace ts_tenant1
partition p_tenant2 values less than (3) tablespace ts_tenant2
partition p_tenant3 values less than (4) tablespace ts_tenant3
partition p_tenantd values less than (MAXVALUE) tablespace ts_default;

Then later

alter table transaction 
add partition p_tenant4 values less than (5) tablespace ts_tenant4;

This will create something that looks and behaves just like a normal table, but actually rows where tenant_id=1 will be in a partition in tablespace ts_tenant1, and queries will ignore all other partitions. Queries across the entire table can run in parallel on each partition. If tenant_id=4 in this scenario, the row will live in ts_default unless you add the new partition as shown, but the INSERT won't be rejected because there's no partition for it!

FWIW At my site we use partitioned tables in our 40Tb DW, you don't need to worry about this approach scaling or performing, if you choose your partitioning strategy well (e.g. you could partition on tenant_id then subpartition on month perhaps), create the right indexes, and so on.

When is data moved during an Oracle partition split

It won't move the data.

Test case:

SQL> CREATE TABLESPACE myts1 DATAFILE '/u01/app/oracle/oradata/ORA112/myts1_01.dbf' size 50M;

Tablespace created.

SQL> CREATE TABLESPACE mytsmax DATAFILE '/u01/app/oracle/oradata/ORA112/mytsmax_01.dbf' size 50M;

Tablespace created.

SQL> CREATE TABLE mytable (id NUMBER, dt DATE)
PARTITION BY RANGE (dt)
  (PARTITION mytablep1 VALUES LESS THAN (TO_DATE('2013-01-01', 'YYYY-MM-DD')) TABLESPACE myts1,
   PARTITION mytablep2 VALUES LESS THAN (TO_DATE('2013-02-01', 'YYYY-MM-DD')) TABLESPACE myts1,
   PARTITION mytablep3 VALUES LESS THAN (TO_DATE('2013-03-01', 'YYYY-MM-DD')) TABLESPACE myts1,
   PARTITION mytablep4 VALUES LESS THAN (MAXVALUE) TABLESPACE myts1
  );  

Table created.

I inserted some data. The results:

SQL> select dt, count(*) 
  2  from mytable
  3  group by dt;

DT      COUNT(*)
--------- ----------
11-NOV-12     262272
11-JAN-13     262272
11-FEB-13     262272
11-MAR-13     262272

Now, the best way of working out if a row has moved is to look at its rowid. The rowid for a given row is based on the location of the row in a datafile and block. Therefore, if the rowid changes it means the row has moved! Documentation link.

So, create a table to hold the rowids of the existing rows in the table, and insert them:

SQL> create table myrowids ( r rowid );

Table created.

SQL> insert into myrowids ( select rowid from mytable );

1049088 rows created.

SQL> commit;

Perform the split:

SQL> ALTER TABLE mytable
SPLIT PARTITION mytablep4
AT ( TO_DATE('2013-04-01', 'YYYY-MM-DD') )
INTO ( PARTITION mytablep4 TABLESPACE myts1, PARTITION mytablep5 TABLESPACE mytsmax );   

Table altered.

SQL>

Check that the rows haven't moved (could have used an EXISTS instead):

SQL> select count(*) from myrowids where r not in ( select rowid from mytable );

  COUNT(*)
----------
     0

SQL>

They haven't!

Best Answer

Related Solutions

How to horizontally partition an oracle database table, and should I

When is data moved during an Oracle partition split

Related Question