This is a follow-up to an earlier question. I have a SQL Server 2008 R2 Standard server, that holds a single database, which itself has almost nothing except a large table.
The table is 100+ million rows (35 columns) and growing at around 250,000 rows per day. We need all the data to be "online", and most of the columns need to be searchable in some fashion. The vast majority of activity on the table is reading; apart from the new data being INSERT
ed during the day, there's no need to change anything.
Users perform a range of queries on the table, ranging from simple look-up-a-record requests to pulling tens of thousands of rows based on a range of criteria. We only have limited control on the queries that are run, and performance is starting to suffer, even with indexing.
A big part of the problem is disk I/O, which we're addressing by retrofitting a SSD-based array. As all database files will be on this new array, the consensus is that having multiple database files won't make any difference, but that splitting the table up into separate tables might be the way to go.
I'm now puzzling over what would be the best approach to this. Two ideas that I'm debating with myself:
-
Split the table into "tiers"
- A table containing the last week's data, which is the one being
INSERT
ed into each day - Next table containing from last week
back to 3 months previous - Next table containing from 3 months to 6 months
- Next table containing anything older than 6 months
I'd then "shuffle" the data down the tiers overnight (the database
is only accessed 8am-10pm, so I have a window overnight to process
data). - A table containing the last week's data, which is the one being
-
Create tables for data ranges
- Create a table for a data range – say per quarter. I'd then have
the dataINSERT
ing into the table 2Q2013, and then trip over to
3Q2013, 4Q2014 etc …
I could use filegroups to make older tables "read only" if this
would improve performance. - Create a table for a data range – say per quarter. I'd then have
Option 1 is the easiest for me to implement, but I'm not sure if this is a completely mad idea. Option 2 is a more work to implement and maintain, but if it's "best practice" for this kind of problem that it's the way I'll go.
Any and all advice or alternative ideas would be gratefully received – I'm away that these kinds of problems are best solved at design-time.
Best Answer
I personally would go with your first option. If you use a `DELETE dbo.p1 OUTPUT INTO dbo.p2' pattern to move the data there is not a lot that can fail. Moving 250K x 3 rows that way in the timeframe you have should not be a problem either if you do it in batches with around 10K rows.
The advantage I see over the more common calendar based partitioning is that your "partitions" stay the same size. With the amount of data you are dealing with a calendar base partitioning approach would work very well in the beginning of the month and potentially significantly slower at the end of the month.
I have written an article together with Kalen Delany about the advantages of the "moving data from partition to partition as it ages" approach a while back in sqlmag: http://sqlmag.com/database-administration/using-table-partitions-archive-old-data-oltp-environments
That article is using the enterprise-only build-in partitioning feature, but you can implement it using handmade multi-table-partitioning too.
I would try to place the different tables (partitions) on separate drives. Because your data in the older partitions changes only during downtime, you could still mark those filegroups readonly during the day. Alternatively you can use
TRANSACTION ISOLATION READ UNCOMMITTED
orREAD COMMITTED SNAPSHOT
isolation. The later should speed things up on the "current" partition too, assuming you really do not ever update data. But even with updates it might help. Either way, make sure you test the performance in your environment. (In any event, do not useUNCOMMITTED
in the active table/partition as you read queries might end up seeing half-written rows, depending on the data types you are using.)