How to horizontally partition an oracle database table, and should I

aggregateoraclepartitioning

We have a tenanted data warehouse that we are doing reporting on. Queries are beginning to take a long time, and we're looking at options to reduce this. There are two thoughts at the moment.

  1. Create tenant specific aggregate tables, and query from those.

  2. Horizontally partition the data based on tenant.

The first option means that for every new tenant that comes onboard, we'll need to create a new set of tables. This isn't that difficult, as a new tenant signing up comes with a few weeks notice, and a lack of reporting will be revealed early on if we forget.

Partitioning the data to me sounds like a better approach, since we're not duplicating the data. We don't have to rely on a process to transfer new data to the aggregate tables.

I'm wondering which of these options would be better, if anyone has had similar experiences before. Does partitioning the data actually help? Or would it be not so much different from leaving all the data in one 'space'?

And, with Oracle 10g, how does one horizontally partition data? If I had the following table:

TABLE Transaction(id, tenant_id, a, b, c, d)

We'll be migrating to Oracle 11g soon, so any differences in partitioning across versions would be appreciated.

(Note: I tried to use a partitioning tag, but not enough rep, if someone else could add a tag that'd be cool)

Best Answer

To answer your second question first: yes you should partition. Oracle's query optimizer has a feature called partition elimination, which will check the predicate for the partition and only execute the SQL on the appropriate partitions.

Partitioning also leaves all the data in one space. Conceptually, think of it as many tables of identical structure, with an implicit UNION ALL between them if you were to do a SELECT from the entire table. Except "under the hood" Oracle sorts the actual rows into the right "table" based on the criteria you specify. Any rows that come in that don't match any of the criteria, go into what's known as the "default" partition.

For what you want to do, a "range partition" might be a good approach (so you can add more tenants later), e.g.:

create table transaction (id, tenant_id, a, b, c, d)
partition by range(tenant_id)
partition p_tenant1 values less than (2) tablespace ts_tenant1
partition p_tenant2 values less than (3) tablespace ts_tenant2
partition p_tenant3 values less than (4) tablespace ts_tenant3
partition p_tenantd values less than (MAXVALUE) tablespace ts_default;

Then later

alter table transaction 
add partition p_tenant4 values less than (5) tablespace ts_tenant4;

This will create something that looks and behaves just like a normal table, but actually rows where tenant_id=1 will be in a partition in tablespace ts_tenant1, and queries will ignore all other partitions. Queries across the entire table can run in parallel on each partition. If tenant_id=4 in this scenario, the row will live in ts_default unless you add the new partition as shown, but the INSERT won't be rejected because there's no partition for it!

FWIW At my site we use partitioned tables in our 40Tb DW, you don't need to worry about this approach scaling or performing, if you choose your partitioning strategy well (e.g. you could partition on tenant_id then subpartition on month perhaps), create the right indexes, and so on.