Sql-server – Clustered Index Help, where did i go wrong

clustered-indexindex-tuningsql serversql-server-2008-r2

I was asked to review a very busy table and identify any areas of improvement.
I have very limited power to change the table just the indexes.

Table information

  • 240 columns
  • ~5 million rows
  • Read and updated by about 30 applications, which range from websites to polling applications.
  • Each row represents a contract and its state based on three flags (varchars of size 5, 8 and 8).
  • A row's life cycle goes from start to end through these three flags, and eventually is done.
  • A row typically is updated or changed 10 to 30 times throughout this lifecycle.
  • The primary key is a combination of an id column, a guid column and a company column.
  • Table has 40+ indexes, the majority of which are duplicates and unused. This is based on the sys.dm_db_index_usage_stats DMV, running twice a week for the past 7 weeks.

The current clustered index on this table has five columns:

  1. Company column (50 distinct values)
  2. Region column (21 distinct values)
  3. FlagA column (8 distinct values)
  4. FlagB column (24 distinct values)
  5. FlagC column (5 distinct values)

My understanding is a clustered index should adhere to the following properties. Source

  1. Unique
  2. Static
  3. Narrow
  4. Ever increasing.

The current clustered index is none of those.

  1. There is no unique Id.
  2. The three flags are updated constantly.
  3. There can be 5000 rows any given moment that have the same values for all 5 columns.

So my assumption was to correct this with a clustered index on the Id column – an integer that is NOT an identity but maintained through a counter table (read value, add 1, update counter table).

I created a clustered index on the Id, not using the primary key because I believed adding the guid and company column would not give me any benefit.

I then created a nonclustered index which holds the Company, Region and 3 flags.
In a test environment the stats were looking good, user_updates were lower etc. But the overall performance of applications against this table was terrible. The most common queries against this table are:

SELECT * 
FROM table 
WHERE ID = 1234;

And

SELECT * 
FROM table 
WHERE Company = 'company' 
AND Region = 'region' 
AND flagA= 'A'
AND flagB = 'B'
AND flagC = 'C';
  • What I have missed? Are there exceptions to the above clustered index rules?
  • Would the clustered index benefit from adding the company and region to the id?

It is my understanding if the clustered index is on non-static values it needs to constantly reorganize itself and the other nonclustered indexes, which at this point we still have 40 plus…to be later removed. Wouldn't I have seen a gain there?

Added information.

  • Both environments are SQL Server 2008 R2
  • Have the same OS, memory etc.
  • The only difference was the clustered index.
  • I wrote the SELECT * to be concise in the above example.
  • Test and production environment are "the same", as best they can be: Production was facing a heavier load, but both were running same applications, load balanced etc. Test was significantly slower.
  • We ran with the new clustered for a week and a half before we rolled back to the original setup. Initially, stats were updated and specific stored procs were recompiled as slow performance started to crop up.
  • I do agree that with the multiple column index at least the table was organized better than just on Id. I want to reattempt with Company, Region, and Id as the CI, but leave the three flags out of it.

What have I missed?

Best Answer

Are there exceptions to the above clustered index rules?

The general guidance for selecting a clustered index is good, but there are sometimes additional considerations to be taken into account. Occasionally, these extra factors can be more important than the general 'rules'.

Your scenario is somewhat "special" in that you have a very wide table, with a set of queries that request a (presumably) unpredictable set of columns, although the query predicates are normally the same.

The original clustered index arrangement was likely expensive for operations that change data, since changing any part of the clustering key means changing all nonclustered indexes as well. In addition, physically moving whole rows is costly in terms of log generation, especially where a page split occurs.

That said, once a clustered index has been modified and split a fair bit, it will contain quite a bit of free space, making future movements less intensive, just as if a sensible fill factor had been set in the first place, and maintained as part of normal index maintenance.

It seems likely your production system had fallen into a sort of equilibrium, where page splitting occurred at a steady, reasonable rate. Nonclustered index maintenance would still be relatively expensive, but it seems that was not a dominant factor.

The crucial advantages of the original indexing arrangement were:

  1. The clustered index (Company, Region, FlagA, FlagB, FlagC) matches the predicate of:

    SELECT {unpredictable column list}
    FROM table 
    WHERE Company = 'company' 
    AND Region = 'region' 
    AND flagA= 'A'
    AND flagB = 'B'
    AND flagC = 'C';
    

    ...while at the same time providing access to whichever columns are listed in the select list.

  2. Queries of the form:

    SELECT * 
    FROM table 
    WHERE ID = 1234;
    

    ...were adequately supported by the nonclustered primary key. This query always returns a single row, so only a single lookup via the clustered index is required after the row is located in the nonclustered index.

Changing to a clustered index on ID, and a nonclustered index on (Company, Region, FlagA, FlagB, FlagC) makes the second query a little more efficient (eliminating one lookup per query), but makes the first query much less efficient (replacing zero lookups with ~5000).

Moreover, it is very possible the optimizer might choose not to use the nonclustered index at all, estimating that a full scan of the table would be cheaper than the 5000 lookups.

You're probably better off leaving the two main indexes as they are for the time being, while you sort out the 40-odd nonclustered indexes, and analyse the workload for the minimal set of covering nonclustered indexes that would be required. Once that data is available, you will be in a better position to consider fundamental indexing changes.

You may also want to check the existing monitoring and maintenance routines for this table. In many scenarios, the 'rule-breaking' clustered index will be just fine, if the table is given a fill factor that just prevents significant page splitting before the next maintenance window comes around.