I will tell you my experience in defining the terms of "large", "big", "a lot":
a lot is a database that takes around 400 GB for the data of a full month (custom logging information from all our web apps)
big is a table in this database that contains much of this space :-).. about half of this size (200 GB a table, from what I kinda remember)
large is the data in a full day (about 12-15 GB) - which means a partition of that magic table
This is not a rule or a best practice.. but when you feel that your current indexing strategy is a bit lost and nothing seems to make your queries faster, I believe that it's time to take into account the partitioning.
I don't think it's possible. The notion seems to fly in the face of everything that we know about partitioning and clustered indexes, but I might be wrong.
One possibility is a partitioned view over two tables, one with a clustered index, one as a heap. I have included a demo script for this, but it does come with a number of limitations which might be critical:
- inserts seem to need to explicitly insert into each column
- the heap table still needs a primary key which includes the partitioning column (even if the primary key is not clustered)
- moving data from 'recent' to 'archive' involves moving from one table to another
Overall your situation is probably complex enough that it needs a much more considered analysis.
Use tempdb;
Create Table TestOld (
id int,
partition_column char(1) Constraint chk_partition_old Check (partition_column = 'O'),
other_column char(100),
Constraint pTestOld Primary Key Clustered (partition_column, id));
Create Table TestNew (
id int,
partition_column char(1) Constraint chk_partition_new Check (partition_column = 'N'),
other_column char(100),
Constraint pTestNew Primary Key Nonclustered (partition_column, id));
Go
Create View Test As
Select id, partition_column, other_column
From TestOld
Union All
Select id, partition_column, other_column
From TestNew;
Go
Insert Into Test (id, partition_column, other_column)
Values (1, 'O', ''),
(2, 'N', '');
Select *
From Test
Where partition_column = 'N';
A second option which may have more merit in your case is a filtered index which you could add over the old part of your table and include a large number of columns so that it covers the queries that you do on the old part of the table (this would serve to speed up reads to the old part of the table but wouldn't slow writes in the recent part).
Best Answer
Let's compare them
PARTITION SIZE
If you have the following:
What would the metrics look like?
Since LOG(100000000)/LOG(2) = 26.575424759099, a BTREE index with 1024 keys per page treenode would have a tree height of only 3 (CEILING(LOG(100000000)/LOG(1024))). With only three pages nodes, a binary search for the needed key in each accessed treenode would result in a pruning and isolating of about 30 keys.
NUMBER OF PARTITIONS
If you have the following:
The numbers would be slightly different.
Each partition should have about 97656 rows. What would the metrics become now?
Since LOG(97656)/LOG(2) = 16.575421065795, a BTREE index with 1024 keys per page treenode would have a tree height of only 2 (CEILING(LOG(97656)/LOG(1024))). With only two pages nodes, a binary search for the needed key in each accessed treenode would result in a pruning and isolating of about 20 keys.
CONCLUSION
Spreading out the keys just removes one tree level but essentially creates 1024 indexes. The queries won't know the difference. The search time would probably be nominal at best in favor of partitions. However, make sure all the data is active. Otheriwse, you may be hitting just a few partitions, while other partitions with rarely-accessed data just takes up space and are never accessed frequently enough to justify the partitioning. You may have different performance metrics to worry about that are more blatent (such as internal defragmentation in XFS, ext3 vs ext4, etc.) You also need to worry about which storage engine you are using because: