Sql-server – Optimizing table/indices for getting the latest row (filtering by one additional column)

database-designindexperformancesql serversql-server-2012

I have a SQL Server table (SQL Server 2012 SP3 Standard edition) that stores a bunch of configuration information (basically text blobs) for different organizations. The schema is something like this:

[ConfigurationID]       INT IDENTITY (1,1) NOT NULL,
[OrganizationID]        INT NOT NULL,
[TimestampUtc]          DATETIME NOT NULL,
[ConfigurationData]     NVARCHAR (MAX) NOT NULL,
[ChangedBy]             NVARCHAR (256) NOT NULL,
[Comment]               NVARCHAR (MAX) NOT NULL,
[ChangeType]            INT NOT NULL

The TimestampUtc will always be increasing (I won't ever be INSERTing "back-dated" entries into the table), and the rows won't ever be UPDATEd (I'm only INSERTing new rows). For some OrganizationIDs there will be lots of rows, for some very few, and a new row for any OrganizationID may be INSERTed at any time.

If needed, I can guarantee uniqueness of TimestampUtc (but it would be great to have a solution that didn't need that).

INSERTs are relatively rare (at most dozens of times per day, but typically much less than that), reads are very frequent (essentially on every web request to my application).

My goals are:

  • Getting the ConfigurationData with the latest TimestampUtc for a given OrganizationID should be extremely fast regardless of the size of the table
  • INSERT performance doesn't matter too much, but I'd like to avoid horrible index fragmentation if at all possible (so my first idea of a unique clustered index on OrganizationID ASC, TimestampUtc DESC is probably not a great idea).

Questions

I know I can denormalize and just store the latest ConfigurationData in one table and the historical log of previous values in another table, but is it possible to meet my goals with just one table? What's the best way to do it? (I.e. what's the best index structure? do I need to change anything about the table schema, etc.?)

Best Answer

Given the low rate of inserts, your proposed index will be completely fine, and perfect for the usage goal.

Given a fresh index with fill factor 100%, and sufficient history for each organisation to fill a page, there will be a page split on the first subsequent insert for each organisation. But then there won't be another page split for that organisation until the new page is filled up.
Even these splits and fragmentation can be mitigated by starting with a <100 fill factor, and regular reorganise.

Aaron's thought regarding using non-clustered was probably because that would allow you to cluster on the always ascending identity column, ensuring the only splits and fragmentation you will get is in a very compact separate index. But I suspect he was only mentioning that in the context of you wanting very much to avoid fragmentation, and not as something really necessary in this scenario.