PostgreSQL – Maximum Partitioning Techniques

database-designpartitioningpostgresql

I have 400,000 records I need to partition. The records will increase 100,000 per year and those new records might need to have 1000 new partitions added.

Here are some of my concerns:

How many partitions are too many
Is having small partitions bad (could have less than 150 records per partition)
Large partitions will be 10,000 or more records
partitions are created by field, not a date
new records could be added daily to the larger partitions but the smaller partitions might be monthly
possibly need to separate partitioned data legally (some cases but not all)

Example:

Let's say 10,000 out of 100,000 will need to be broken into partitions of 100 records, so 1000 partitions will be created and 9 additional partitions will be created with 10,000 records each, for a total of 1009 partitions.

Questions:

is this approach going to work
is this the best approach, if not what are some other ideas

Notes:

Most records will fall into the larger partitions

Best Answer

You need one partition for that many records. Not 1000. Certainly not 1000/year. This is not a problem that requires partitioning. It looks to me like you've decided on the solution before fully stating and analysing the problem.

Reading between the lines, it sounds like you're implementing a mulit-tenant system and have already decided that partitioning is the way to do that. Right?

If so: wrong approach. Start with a single table. Partition if/when you need to for performance and maintenance reasons. With a DB of this scale it is very unlikely that you will ever need to, it's tiny.

How many partitions are too many

Because the constraint exclusion code isn't super smart, try to stick to low partition counts. I prefer tens or hundreds at most.

Is having small partitions bad (could have less than 150 records per partition)

Yes, it's very wasteful in terms of planning and execution time.

possibly need to separate partitioned data legally (some cases but not all)

What's the difference between a partition and a single table with a composite key? I've never seen a legal or regulatory code that goes down to the level of actually specifying database structure, other than maybe PCI, and not in this way.

Details please.

is this the best approach, if not what are some other ideas

Use one table, a composite key, and some composite indexes. If useful/necessary, use partial indexes for sub-ranges.

Related Solutions

SQL Server Partitioning – Partitioning and Clustered Indexes in SQL Server 2008

A partitioned table is really more like a collection of individual tables stitched together. So your in example of clustering by IncidentKey and partition by IncidentDate, say that the partitioning function splits the tables into two partitions so that 1/1/2010 is in partition 1 and 7/1/2010 is partition two. The data will be layed out on disk as:

Partition 1:
IncidentKey    Date
ABC123        1/1/2010
ABC123        1/1/2011
XYZ999        1/1/2010

Partition 2:
IncidentKey    Date
ABC123        7/1/2010
XYZ999        7/1/2010

At a low level there really are two, distinct rowsets. Is the query processor that gives the illusion of a single table by creating plans that seek, scan and update all rowsets together, as one.

Any row in any non-clustered index will have have the clustered index key to which it corresponds, say ABC123,7/1/2010. Since the clustered index key always contains the partitioning key column, the engine will always know in what partition (rowset) of the clustered index to search for this value (in this case, in partition 2).

Now whenever you're dealing with partitioning you must consider if your NC indexes will be aligned (NC index is partitioned exactly the same as the clustered index) or non-aligned (NC index is non-partitioned, or partitioned differently from clustered index). Non-aligned indexes are more flexible, but they have some drawbacks:

non-aligned indexes require large amounts of memory for certain query plans
non-aligned indexes prevent efficient partition switch operations

Using aligned indexes solves these issues, but brings its own set of problems, because this physical, storage design, option ripples into the data model:

aligned indexes mean unique constrains can no longer be created/enforced (except for the partitioning column)
all foreign keys referencing the partitioned table must include the partitioning key in the relation (since the partitioning key is, due to alignment, in every index), and this in turn requires that all tables referencing the partitioned table contain partitioning key column value. Think Orders->OrderDetails, if Orders have OrderID but is partitioned by OrderDate, then OrderDetails must contain not only OrderID, but also OrderDate, in order to properly declare the foreign key constraint.

These effects I found seldom called out at the beginning of a project that deploys partitioning, but they exists and have serious consequences.

If you think aligned indexes are a rare or extreme case, then consider this: in many cases the cornerstone of ETL and partitioning solutions is the fast switch in of staging tables. Switch in operations require aligned indexes.

Oh, one more thing: all my argument about foreign keys and the ripple effect of adding the partitioning column value to other tables applies equally to joins.

Sql-server – Partitioning in SQL Server 2008

For what you want to do, I would recommend the following (which is pretty much what you were thinking).

1> Create history tables for the historic data you have - keep the schemas as similar as possible. Split up by some logical grouping (such as year/month) based on how they are going to be queried (say you need to report with in month/year as well as all). Do not worry about the table size of the splits unless they are getting into the TB size range (your dbms should handle it) just make sure that they are appropriately indexed for the queries that need to be run. You should consider putting these onto a different disk to the active data if performance is an issue.

2> Create a routine to move data from the active table to the relevant historic table. Run this periodically. As a practice rebuild the indexes on the table that has had the data removed from it, and maybe update the table statistics. Easiest way to do this is to write a sql script.

3> Consider the reporting you want to do. If you want to only have to deal with 1 table when writing queries, create a view that joins the archived tables together. Create indexes on all the tables to suite the view. This way if you want all the data, select from the view. If you want data from a specific year/month, query that table. The view will look something like:

create view view_all_data as select "Jan12" as month,a.* from data_Jan12 a union select "Feb12" as month,b.* from data_Feb12 b ....

I am assuming here that the system is not a highly used transactional system and that you have windows of low usage to run the analysis queries. If you need to maintain high levels of performance, you may like to do the above in a separate database (separate hardware) and port across the new data that you get from backups.

Best Answer

Related Solutions

SQL Server Partitioning – Partitioning and Clustered Indexes in SQL Server 2008

Sql-server – Partitioning in SQL Server 2008

Related Question