Sql-server – ny way to test which clustered index is better for specific queries

clustered-indexindexindex-tuningsql servert-sql

We've been querying and processing a growing table of events in our system, which was fine without specific indexes for now. However we've noticed a decrease in performance and are wondering what we can do to improve that.

I've created a few test tables to test different indexes:

SELECT TOP 1000000 * INTO IndexTest1
FROM Events

SELECT TOP 1000000 * INTO IndexTest2
FROM Events

SELECT TOP 1000000 * INTO IndexTest3
FROM Events

Next I added different clustered indexes on columns that are often used to query the events or process them. Most of the time we use the Timestamp or the EventType.

CREATE CLUSTERED INDEX TimestampEventType
ON IndexTest1 (Timestamp, EventType)

CREATE CLUSTERED INDEX Timestamp
ON IndexTest2 (Timestamp)

CREATE CLUSTERED INDEX EventType
ON IndexTest3 (EventType)

Next I tested two different types of queries. However I currently have no way of noticing which performs best.

SELECT * FROM IndexTest1
WHERE EventType = 'String'

SELECT * FROM IndexTest2
WHERE EventType = 'String'

SELECT * FROM IndexTest3
WHERE EventType = 'String'

SELECT * FROM IndexTest1
WHERE Timestamp >= '2018-03-14' AND Timestamp <= '2018-03-20' AND EventType = 'String'

SELECT * FROM IndexTest2
WHERE Timestamp >= '2018-03-14' AND Timestamp <= '2018-03-20' AND EventType = 'String'

SELECT * FROM IndexTest3
WHERE Timestamp >= '2018-03-14' AND Timestamp <= '2018-03-20' AND EventType = 'String'

What I expected:
I expected IndexTest1 to perform best for both queries, seeing as it is the only one that includes both EventType and Timestamp in its clustered index.

Results second query set:
When turning on the live query statistics, I get varying times for execution for all queries, ranging between 8 – 20 seconds.

When looking at the execution plans I do notice that IndexTest1 has a much lower Estimated I/O Cost and Estimated Operator Cost for the second set of queries than IndexTest2 & IndexTest3. The difference is relatively large, IndexTest1 has an I/O Cost of about 0.003 whereas IndexTest2 & IndexTest3 are around 25 and 40 respectively.

Question: Can I conclude that for the second query set IndexTest1 is the best index?

Results first query set: When looking at the execution plans for the first set of queries IndexTest3 actually has the best performance (although barely). All of them are around an I/O cost of 40, with IndexTest2 being the worst and IndexTest 3 being the best.

Questiong: Does that mean that the indexes I added have no real effect on the first query set? And if so, why would that be if EventType is included in the clustered index of IndexTest1?

I have very little experience with testing indexes and creating them. Is there any other way I can test which index would perform best for the above queries? Or is there something else I should try out?

Or does anyone have any reading material they could link me to for more information about indexes and testing them?

Best Answer

The Clustered Index should be defined with a narrow and ever increasing value (this is why so many of them use an auto-incrementing integer).

Having worked with large tables like this before, your best bet is to continue to use an auto-incrementing primary key as the definition for the clustered index. Then create non-clustered indexes that support your queries, with the understanding that other ad-hoc queries will probably perform terribly. You get the added benefit that the index on your large table should never need defragmentation.

Given that, creating a non-clustered index on Timestamp, EventType should give sufficient performance. You typically want the most specific column referenced first, and timestamps are good candidates for this, especially if you always (or almost always) include it in your query.

The advantage of non-clustered indexes is of course that they are typically much narrower than their base table and so require much less memory.

CREATE NONCLUSTERED INDEX TimestampEventType
ON IndexTest1 (Timestamp, EventType)

You can also benefit from casting to the proper datatype (I'm assuming DATETIME)

SELECT * FROM IndexTest1
WHERE Timestamp >= CAST('2018-03-14' AS DATETIME) AND Timestamp <= CAST('2018-03-20' AS DATETIME) AND EventType = 'String'

Finally, check that statistics are up to date for your query and that the execution plan is actually doing an index seek or range scan with a key lookup back to the main table.