Sql-server – Clustered Index fragmentation vs Index with Included columns fragmentation

clustered-indexindexindex-tuningperformancesql server

I have a relative large table (for me) with 40 million rows that is expected to grow to 80 to 120 million rows in a period of two weeks (during a campaign).

Tip
--------------
Id          int (clustered index)
UserId      int
TipIndex    smallint
Value       binary(8)
LastChanged datetime2(3)

every user has between 1 to 400 tips that will be inserted at random during this period
UserId + TipIndex is unique
I never query Tips directly on the Identity Key (Id)
99% of the times I query on UserId
I need all columns
I often query on 1 UserId (per page view), sometimes on a batches of 10.000s for statistics
this is a high traffic site during this period and it should be able to handle 30.000 queries on UserId per minute
Id is at the moment my Clustered Index, because I read that it leads to the least fragmentation.

So I'm hosting on SQL Azure and Azure already recommended adding an Index with included columns.
I was always hesitant on using UserId,TipIndex as a Clustered Index, as Tips will be added at random. Meaning that I'm afraid of huge fragmentation problems etc.

My questions:

Doesn't an Index with included columns have the exact same problem?
Is a table with included columns not just the same as a "shadow table" with the same fragmentation problems?
Should I migrate to use UserId, TipIndex as a ClusteredIndex instead of Id?
How to prevent fragmentation?

I know in the end the answer is always "depends" or that I should measure it. But as I'm a solo developer and not with a lot of resources I am hoping on someone with more experience that has a gut feeling for this, so my first attempts have a higher chance of going in the right direction.

Best Answer

It seems you are too much concerned about fragmentation, As long as you keep updating statistics regularly, fragmentation shouldn't bother you much for performance. You may read more details about this on a video shared by Mr. Brent Ozar and also another page here. Let me try answering your question one by one:

Doesn't an Index with included columns have the exact same problem?

Index with columns in the include or be it a key column, doesn't differ much. Key columns are going to be part of B-tree whereas include columns are not however when you perform any insert/update/delete operation, this will have same expense as these columns need to be updated/inserted/deleted.

Is a table with included columns not just the same as a "shadow table" with the same fragmentation problems?

Not very sure what you meant by shadow table, if you meant by base table here then yes, you would have same problem as far as fragmentation is concerned.

Should I migrate to use UserId, TipIndex as a ClusteredIndex instead of Id?

As per your statement - "99% of the times I query on UserId", this is a good candidate for primary key clustered column. Since you are not going to use Id column very frequently, I don't see any problem in using composite primary clustered key in the form of UserId and TipIndex. In terms of size of index, it is as good as Id(int --> 4 byte) plus an additional column of tinyint type(1 byte).

Kindly understand that, clustered key is nothing but the order of data stored logically and doesn't have any physical existence unlike non-clustered key.

How to prevent fragmentation?

I would say updating statistics should be considered with priority than that of fragmentation. You may use maintenance script used by many DBAs across the world from Ola Hallengren. You can schedule it weekly or bi-weekly basis your requirement.

Hope this helps.

Related Solutions

Sql-server – Index fragmentation increase as more row being deleted

This isn't fragmentation.

Fragmentation is generated of course, but deletes will simply create "islands" of remaining pages, which is less evil then GUID/clustered key INSERT fragmentation.

If you're PK is an IDENTITY, then CreationDate should roughly track this so you're actually deleting chunks of contiguous rows anyway.

Do you have an index on CreationDate
Do you have delete cascades?
Is the TOP 1000 in a single transaction?

For point 3, doing a loop inside a transaction is pointless: is this it?

At some point, a statistics update may be needed if you delete enough rows but I don't think it's that.

Other options:

why not use TRUNCATE TABLE, wrapped in a stored procedure with EXECUTE AS OWNER
use SYNONYMs for poor man's partitioning

Sql-server – add “Included Columns” to an index without affecting performance

Why would you do anything through the SSMS GUI???

Try doing it as a pro, by a script. Your best option is to create the index online with drop existing, changing the definition to match the new desired schema in the process:

create index ... on ... include ... with (online = on, drop_existing = on);

This, obviously, is subject to the usual restrictions for ONLINE operations, and will cause the normal overhead caused by an ONLINE index create/rebuild operation. But the good news is that the operation will be online, will not block existing queries and the old index will be available for queries while the new one is being built. There will be a query drain stage at the end when all queries will block while the indexes are being swapped, but that should be a very fast operation (as long as there are not long running user transactions blocking the swap itself).

Best Answer

Related Solutions

Sql-server – Index fragmentation increase as more row being deleted

Sql-server – add “Included Columns” to an index without affecting performance

Related Question