Sql-server – How many rows can a SQL Server 2012 table hold, and when should I partition it

partitioningsql serversql-server-2012

Say a table stores granular data about some event. It has the date of the event, a type dimension with around 30K types, and a category dimension with around 100 categories, plus some numeric facts.

On average, there are 15 million transactions per day. More than 5 billion per year, over 60G per decade. That's not big data, but it's a lot.

How many rows can a SQL Server 2012 table hold?

Of course, older data is used less often and is candidate for being partitioned in multiple tables on the same DB. But when should this partitioning start happening? 1 table per year? 5 years?

Additional information collected from comments:

Consider: I have enough storage to hold 30 billion records of that event. If each event record requires 1KB, I have 30TB in that table, and enough storage for that (and for log of it). Its PK is bigint.

What do you think of having one table with historical data and another table with latest data? Instead of a transactional event, the table has a catalog, clients, for example. Every day the OLTP's catalog is copied into DW. So I'd have a table holding the history data and another table with the latest records.

In the design I use, ETL feeds the history table, then I use row_number() to grab the latest record of each entity by its NK. It's very expensive to run, but this way I keep entities that existed in the past and aren't on the OLTP anymore.

Best Answer

How many records can a MSSQL2012 table hold?

As stated on the MSDN page for Maximum Capacity Specifications for SQL Server (for SQL Server 2012):

"Rows per table = Limited by available storage" (same for both 32-bit and 64-bit platforms)

But when should this partitioning start happening? 1 table per year? 5 years?

This all depends on the needs of the system. There is no inherent need to partition, ever really, solely based on the issue of performance. Partitioning is mainly intended as a means of more easily managing getting mass amounts of data into, or out of, a table, as quickly as possible and causing as little contention as possible. If the desire was purely to assist query performance, maybe start TESTING it around 1 billion rows, but even then, if you have a good data model and good indexing, you probably won't need to even bother with this. Also, filtered indexes and even filtered statistics probably would do well enough for many cases where people choose to implement table partitioning (if their intention is purely performance-related).

But if your need is to quickly remove a large block of rows, perhaps for aging out older data, then table partitioning will help as you can SWITCH the "old" data out. And on this level it is not a matter of rows but a matter of what amount of time you want to manage. If you want to switch out data monthly, then do monthly partitions. If you want to age-off data yearly, then try yearly partitions.


UPDATE

Not sure why I didn't mention this earlier, but you should take a look at Partitioned Views. That is when you have multiple tables of identical schema and a View that does a UNION ALL between them, and each table has a CHECK CONSTRAINT enforcing a particular range of data within that table (and so the Query Optimizer knows where to get the data from). By doing this, you can have two tables -- current and historical -- and then have queries that hit either one or the other (if the time-frame is known ahead of time, such as a query that only hits the most recent 90 days), or uses the View if the data could be in either one. Please see the following for more info:

I believe you can even do a combination where the "current" table is partitioned (so that you can quickly switch in the incoming data and switch out the data that is becoming "old"), a non-partitioned table for historical, and a Partitioned View to join the two of them. Then you just need a way to get the data from the newly switched-out partition into the "historical" table.

Also, with regards to performance, there are other features offered, depending on what edition you are using (some only come with Enterprise Edition). But you should look into ColumnStore Indexes, Data Compression, and maybe a few other things.