Sql-server – Clustering a heap by date in stead of unique composite key

performancequery-performancesql-server-2008-r2unique-constraint

I've been working on improving the performance of a large reporting database.
This database is 2TB in size, and one of the larger tables is a heap with three non-clustered indexes on it.

The non-clustered indexes each deal with a combination of Case_ID's with relevant data.
However the case id's are not at all unique, as each case entry is stored in the same heap with the Case_ID.
Inquiries at the business side of things have led me to the fact that a combination of Case_ID, Line_Number and Document_ID is unique.

However, nearly all of the queries are claused by dates (using a datetime column named DATE).
Usually by WHERE DATE = 'xxxx-xx-xx 00:00:00.000, or WHERE DATE > 'xxxx-xx-xx 00:00:00.000'. And the Line_Number and Document_ID values are in most cases not even included in the reports.

As such, I created a clustered index on the heap (on DEV), keyed on the datetime column.
Performance of the reports has increased by a decent amount, reducing query times from 5 minutes to 3 minutes for most of my test reports, and I haven't seen any negative impact on the reports not claused by date.

I am however concerned that I'm looking too far as I understand that non unique clustered indexes are rarely the solution. Is what I'm doing a valid approach? Or should I simply index on the unique composite key, and create a covering index on the datetime column?

Best Answer

Non-clustered indexes use the clustered index key value as a pointer back to the data. If you create a compound clustered index, you are making your non-clustered indexes that much larger.

When choosing a clustered index, especially on tables with hundreds of millions of rows, try to choose a single narrow column if possible. i.e. choose an integer over a varchar field.

The Datetime field is 8 bytes with is relatively narrow. If your data is inserted by date in ascending order (e.g. as time progresses) the records will be inserted at the tail end of the table. Pages and extents will be added as necessary.

If your data data is spread out all over the place however, you will end up with page splits as new records are shoved in between existing records. Make sure your incoming date data is accurate.

Be warned that SQL will add an internal 4 byte "uniquifier" to your clustered index if it is not unique.

So, go for the single column clustered index. Queries that use the datetime column will benefit greatly, and the other indexes will not suffer.