First and foremost: those logical disks better be backed by at least 8 different physical disks. If you're asking about load balancing 8 logical disks created on the same physical storage (the same spindles) then you're wasting time.
The best (and simplest!) option is to create a single filegroup with 8 files (equal in size and pregrown), each on a spindle, and then place the table and the indexes in this filegroup. SQL Server will balance the data equally among the files.
Partitioning is a feature for ETL switch in and switch out. It should not be used for performance, as the best you can hope for is equal performance with the original table. For performance use a well designed clustered index, one that matches the typical load.
If your data is really known upfront and the index usage characteristics are very well understood then you may try to balance them explicitly on their own filegroups. But trying to wrestle manual control over this is more likely to cause harm than benefit. The simpler option of single file group with 8 files balances IO better than manual explicit control 99% of the times.
For what you want to do, I would recommend the following (which is pretty much what you were thinking).
1> Create history tables for the historic data you have - keep the schemas as similar as possible. Split up by some logical grouping (such as year/month) based on how they are going to be queried (say you need to report with in month/year as well as all). Do not worry about the table size of the splits unless they are getting into the TB size range (your dbms should handle it) just make sure that they are appropriately indexed for the queries that need to be run. You should consider putting these onto a different disk to the active data if performance is an issue.
2> Create a routine to move data from the active table to the relevant historic table. Run this periodically. As a practice rebuild the indexes on the table that has had the data removed from it, and maybe update the table statistics. Easiest way to do this is to write a sql script.
3> Consider the reporting you want to do. If you want to only have to deal with 1 table when writing queries, create a view that joins the archived tables together. Create indexes on all the tables to suite the view. This way if you want all the data, select from the view. If you want data from a specific year/month, query that table. The view will look something like:
create view view_all_data as
select "Jan12" as month,a.* from data_Jan12 a
union
select "Feb12" as month,b.* from data_Feb12 b
....
I am assuming here that the system is not a highly used transactional system and that you have windows of low usage to run the analysis queries. If you need to maintain high levels of performance, you may like to do the above in a separate database (separate hardware) and port across the new data that you get from backups.
Best Answer
The main advantage of not partitioning a (non-unique) index on a partitioned base object is that it works around a long-standing query optimizer limitation related to ordered data requests such as
MIN
,MAX
, orTOP (n)
queries.On a partitioned index, the optimizer cannot generally translate
MIN
,MAX
, orTOP (n)
to the same operation per partition, followed by a final global aggregate over the per-partition partial aggregates. The optimizer instead chooses an execution plan that scans all partitions of the index. The exception to this is the single case where the aggregate or top operation is specified over the partitioning column.I should mention that there are also very good reasons not to have any non-aligned indexes. Choosing to use a non-aligned index would have to be a very informed choice. I have done it myself (rarely) in the past, but in very specific circumstances where the benefits clearly outweighed the costs, or there was no other reasonable alternative.
Article by Itzik Ben-Gan explaining the issue.