How to horizontally partition an oracle database table, and should I

aggregateoraclepartitioning

We have a tenanted data warehouse that we are doing reporting on. Queries are beginning to take a long time, and we're looking at options to reduce this. There are two thoughts at the moment.

Create tenant specific aggregate tables, and query from those.
Horizontally partition the data based on tenant.

The first option means that for every new tenant that comes onboard, we'll need to create a new set of tables. This isn't that difficult, as a new tenant signing up comes with a few weeks notice, and a lack of reporting will be revealed early on if we forget.

Partitioning the data to me sounds like a better approach, since we're not duplicating the data. We don't have to rely on a process to transfer new data to the aggregate tables.

I'm wondering which of these options would be better, if anyone has had similar experiences before. Does partitioning the data actually help? Or would it be not so much different from leaving all the data in one 'space'?

And, with Oracle 10g, how does one horizontally partition data? If I had the following table:

TABLE Transaction(id, tenant_id, a, b, c, d)

We'll be migrating to Oracle 11g soon, so any differences in partitioning across versions would be appreciated.

(Note: I tried to use a partitioning tag, but not enough rep, if someone else could add a tag that'd be cool)

Best Answer

To answer your second question first: yes you should partition. Oracle's query optimizer has a feature called partition elimination, which will check the predicate for the partition and only execute the SQL on the appropriate partitions.

Partitioning also leaves all the data in one space. Conceptually, think of it as many tables of identical structure, with an implicit UNION ALL between them if you were to do a SELECT from the entire table. Except "under the hood" Oracle sorts the actual rows into the right "table" based on the criteria you specify. Any rows that come in that don't match any of the criteria, go into what's known as the "default" partition.

For what you want to do, a "range partition" might be a good approach (so you can add more tenants later), e.g.:

create table transaction (id, tenant_id, a, b, c, d)
partition by range(tenant_id)
partition p_tenant1 values less than (2) tablespace ts_tenant1
partition p_tenant2 values less than (3) tablespace ts_tenant2
partition p_tenant3 values less than (4) tablespace ts_tenant3
partition p_tenantd values less than (MAXVALUE) tablespace ts_default;

Then later

alter table transaction 
add partition p_tenant4 values less than (5) tablespace ts_tenant4;

This will create something that looks and behaves just like a normal table, but actually rows where tenant_id=1 will be in a partition in tablespace ts_tenant1, and queries will ignore all other partitions. Queries across the entire table can run in parallel on each partition. If tenant_id=4 in this scenario, the row will live in ts_default unless you add the new partition as shown, but the INSERT won't be rejected because there's no partition for it!

FWIW At my site we use partitioned tables in our 40Tb DW, you don't need to worry about this approach scaling or performing, if you choose your partitioning strategy well (e.g. you could partition on tenant_id then subpartition on month perhaps), create the right indexes, and so on.

Related Solutions

Oracle 11gR2 – Limits on Materialised View replication between databases

This sounds like a job for Change Data Capture (CDC), which allows you to (among other possibilities) ship your archivelogs from the OLTP database to the reporting one, mine them for the changes, then query the changes out, ignoring any you don't want (e.g., changes of type 'D' for DELETE), and using whatever process you might devise apply those changes to your reporting tables.

I have no idea how well CDC would do with a ruleset encompassing 4700 source tables from another database. I've never used it for more than about 50 tables myself.

FYI, there are licensing-related limits on CDC. The full feature set is only available on Enterprise Edition.

Sql-server – SQL Server 2008 – Partitioning and Clustered Indexes

A partitioned table is really more like a collection of individual tables stitched together. So your in example of clustering by IncidentKey and partition by IncidentDate, say that the partitioning function splits the tables into two partitions so that 1/1/2010 is in partition 1 and 7/1/2010 is partition two. The data will be layed out on disk as:

Partition 1:
IncidentKey    Date
ABC123        1/1/2010
ABC123        1/1/2011
XYZ999        1/1/2010

Partition 2:
IncidentKey    Date
ABC123        7/1/2010
XYZ999        7/1/2010

At a low level there really are two, distinct rowsets. Is the query processor that gives the illusion of a single table by creating plans that seek, scan and update all rowsets together, as one.

Any row in any non-clustered index will have have the clustered index key to which it corresponds, say ABC123,7/1/2010. Since the clustered index key always contains the partitioning key column, the engine will always know in what partition (rowset) of the clustered index to search for this value (in this case, in partition 2).

Now whenever you're dealing with partitioning you must consider if your NC indexes will be aligned (NC index is partitioned exactly the same as the clustered index) or non-aligned (NC index is non-partitioned, or partitioned differently from clustered index). Non-aligned indexes are more flexible, but they have some drawbacks:

non-aligned indexes require large amounts of memory for certain query plans
non-aligned indexes prevent efficient partition switch operations

Using aligned indexes solves these issues, but brings its own set of problems, because this physical, storage design, option ripples into the data model:

aligned indexes mean unique constrains can no longer be created/enforced (except for the partitioning column)
all foreign keys referencing the partitioned table must include the partitioning key in the relation (since the partitioning key is, due to alignment, in every index), and this in turn requires that all tables referencing the partitioned table contain partitioning key column value. Think Orders->OrderDetails, if Orders have OrderID but is partitioned by OrderDate, then OrderDetails must contain not only OrderID, but also OrderDate, in order to properly declare the foreign key constraint.

These effects I found seldom called out at the beginning of a project that deploys partitioning, but they exists and have serious consequences.

If you think aligned indexes are a rare or extreme case, then consider this: in many cases the cornerstone of ETL and partitioning solutions is the fast switch in of staging tables. Switch in operations require aligned indexes.

Oh, one more thing: all my argument about foreign keys and the ripple effect of adding the partitioning column value to other tables applies equally to joins.

Best Answer

Related Solutions

Oracle 11gR2 – Limits on Materialised View replication between databases

Sql-server – SQL Server 2008 – Partitioning and Clustered Indexes

Related Question