Sql-server – Which columns (and order) to choose for a clustered index to maximize query performance

clustered-indexsql servert-sql

In our database we have a table that looks more or less like the example below. In the past we always created the clustered index on the ID column on the table.

CREATE TABLE [Measurement]
(
    [ID] INT NOT NULL PRIMARY KEY NOT NULL,
    [ParameterID] INT NOT NULL,
    [Measuretime] DATETIME NOT NULL,
    [Value] FLOAT NOT NULL,

    CONSTRAINT [FK_Measurement_Parameter]
        FOREIGN KEY ([ParameterID]) REFERENCES [Parameter]([ID])
)

CREATE INDEX [IX_Measurement_Measuretime_ParameterID]
    ON [Measurement] ([Measuretime]) INCLUDE ([ParameterID]);

CREATE INDEX [IX_Measurement_ParameterID_Measuretime]
    ON [Measurement] ([ParameterID]) INCLUDE ([Measuretime]);

Our data gets written in 1-5 second intervals with successive timestamps for all Parameters.

We decided that it would probably be a better idea to create the clustered index on ParameterID and/or Measuretime as most queries are on those to columns.

Here are some example how most our queries look like:

Example A

SELECT *
FROM Measurement
WHERE ParameterID = 1
    and Measuretime between '2015-01-24' and '2015-01-25'

Example B

SELECT ParameterID, cast(Measuretime as date), avg(value)
FROM Measurement
WHERE ParameterID = 1
    and Measuretime between '2015-01-01' and '2015-02-01'
GROUP BY ParameterID, cast(Measuretime as date)

Example C

SELECT DISTINCT
    ParameterID,
    FIRST_VALUE(cast(Measuretime as date))
          OVER (PARTITION BY cast(Measuretime as date), ParameterID
                ORDER BY Measuretime ) Measuredate,
    PERCENTILE_CONT(.25)
         WITHIN GROUP(ORDER BY Value)
         OVER (PARTITION BY cast(Measuretime as date), ParameterID) as [q1],
    ...
FROM Measurement
WHERE Measuretime between '2015-01-01' and '2015-02-01'
    -- and ParameterID = 1
ORDER BY Measuretime, ParameterID

Which of those three ways that come to my mind for INDEX creation is the best suited for such a scenario?

  • CREATE CLUSTERED INDEX [CIX_Measurement] ON [Measurement]([Measuretime],[ParameterID]) as this is also the order data gets written, and both columns are queried.
  • CREATE CLUSTERED INDEX [CIX_Measurement] ON [Measurement]([ParameterID],[Measuretime]) as nearly all our queries need to filter by ParameterID in one way or another and only afterwards Measuretime.
  • On only one of those to and go for a normal INDEX for the other column.

Best Answer

Generally, you want to order the columns in the index based on cardinality. That is, the most unique column first, then the second most unique, etc. So, you need to answer whether parameterid or measuretime will have the least duplicate values. Example: if Measuretime has fewer duplicate values than parameterid use option 1. Then, for optimal performance you need to edit your queries to reference the columns in the correct order.

Something to consider, however, is the pattern of writes to this table. Is the data inserted in the order of Measuretime? Will a new ParameterID always be a higher increment of a previously added one? Are there lots of updates and deletes to this table?

The advantage of having a surrogate key, like your ID column above, is that it ensures ordered inserts into the table/clustered index. Among other benefits, this avoids page splits on inserts.

Also, if the combination of ParameterID and Measuretime will always be unique, you could consider a composite primary key. This has its downsides, but is a valid option.

Here's a good explanation of using composite primary keys: http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx