Sql-server – Table partitioning and future values

columnstorepartitioningsql serversql server 2014

I have a clustered columnstore table with approximately 1.4 billion rows.

I figured it being a clustered columnstore would best suit my need for various aggregations on many different splits/columns.

There is currently just a single partition, however I would like to split the data into multiple partitions. I haven't been using partitions before, so I am not quite sure what the best practice is.

The table contains monthly financial information for our customers and will increase with approximately 35 million new rows each month. Each row will have a YearMonth_Id and a Customer_Id as well as a lot of columns containing financials.

I am thinking to do the partition by YearMonth_Id (maybe grouped by quarters?).

I'm no DBA, just a finance analyst. Since we don't have qualified DBA to maintain this table regularly, it comes down to me to do it; hence I want to have as little maintenance on the table as possible.

My question therefore is as follows. When working with partitions and incremental values (YearMonth_Id), how should I structure the partitions on the table?

I read that usually you would leave an empty partition to the far right (if working with left range) that you can split each month (or quarter). However, I am thinking of creating, say 12 empty partitions for future values, so the next time I need to do a split will be in a years time; so that the table is not dependent on me adding a new partition every month.

Is this a big no-no or would it be an okay way to go?

Best Answer

Empty partitions have negligible effect on query performance, so having up to 12 empty partitions, one per month, should not be a problem.

Having said that, you could simply automate the creation of partitions, the way I detail in this answer. That means you'd only ever have a single empty partition, assuaging any fears about performance impacts from too many empty partitions.

Table partitioning is not a panacea for increasing performance and can result in lower performance than a non-partitioned table. The primary use-case for partitioning is to enable the easy swapping out of old data, and swapping in of new data, using the ALTER TABLE ... SWITCH syntax. This answer shows the advantages of using ALTER TABLE ... SWITCH to ease data management.

In order to get some performance improvement via partitioning you need to use the partitioning key in queries against the partitioned table. Queries that don't use the partitioning key in the WHERE clause or in a JOIN clause must inspect all partitions for rows that match the selection criteria.