Sql-server – Multiple files per partition/filegroup

partitioningsql serversql-server-2016

I am designing a partitioning based solution for a high performance large SQL Server 2016 database. Some of the data will be coming in with hundreds of millions of records per day. During the day we will be also running reporting queries and queries that looks for trends over multiple day and weeks.

My current solution will use 70 days in daily partitions using a dedicated filegroup per partition. After the data moves beyond the 70 day mark, it will go into a weekly partition for 42 weeks which are also using dedicated filegroups per partition, then 12 month and then 6 years, all set up in the same fashion.

We need really high performance and massive scale up capabilities (PB+ range). In order to minimize rework down the line, I am thinking about using multiple files per filegroup/partition for the daily and weekly filegroups/partitions. 4 per daily and 2 per weekly to be exact.

This way we could potentially increase read/load throughput per partition, as well as increase maximal capacity of the partitions (don't ask why, but we are worried to actually need that level of capacity on certain days).

Has anyone ever done it this way and what were your results? Is there any reason not to do it this way other than management overhead?

All the weekly, monthly, and yearly partitions would be in the same database on same server (application design issues, but multi-DB could be an option if properly motivated. Multiple servers or instances are not desirable).

The partition breaks are currently being discussed and evaluated. I chose the values above based on information I received on query patterns. Different number of days is of course possible but I kinda like the 10 weeks' worth of daily partitions.

We do have a seriously high-end datacenter, well 2 actually. We are discussing the purchase of a converged solution specific for this platform and one other. I personally would like to see dedicated AFA's (All Flash Arrays) but there are a some bridges to cross before I can get my hands on those.

I am aware of Data Warehouse Fast Track solutions, but they will not work for us. For one, we will be doing OLTP mostly so the benchmark figures will not be representative for what we will get. Second, they don't scale-up enough (at the moment). Some of the elements that come from the reference architectures, will be used of course but the 'turn-key' SKU's won't be an option. I am an ex-MS PFE so those resources are the first I look at.

Best Answer

You can't work on the partition <=> files relationship. Partitions are just affected to a Filegroup. However, splitting your Filegroup into different files will help with performance. The numbers I always use is to have the number of partitions per Filegroup set to the MAXDOP value. I got this looking at how SQL Server split the tempdb into different files and I did notice some improvement copying this behavior.

Regarding partitions, what you should do is split them on filegroups so that the ones that are no longer updated never move. We typically have a Hot filegorup in our DWH, and a Cold one. The cold one is Read Only and we only change that status on the day we move data from Hot to cold. You could do this without having to create one filegroup per partition. That just makes it harder to manage.