I find myself in a situation where I have inherited the db and people complain about database performance. I identified about a dozend of tables not having a clustered index or a not carefully planned clustered index, some of them with lots of data and frequently used. I am actually struggeling to figure out the best candidates for implementation of a CI for the heaps. After a lot of reading I am still not sure what to do. I know it depends. So I would like to give one table as a sample and hope to get some suggestions.
The table has 122 Columns. Yes, I know…
It holds about 800 000 rows.
Structure (excerpt):
CREATE TABLE [dbo].[tO](
[ID] [uniqueidentifier] NOT NULL,
[MaID] [uniqueidentifier] NULL,
[OrID] [uniqueidentifier] NOT NULL,
[UsID] [uniqueidentifier] NOT NULL,
[CrUsID] [uniqueidentifier] NOT NULL,
[SuID] [uniqueidentifier] NULL,
[SOID] [uniqueidentifier] NULL,
[DeA1ID] [uniqueidentifier] NULL,
[DeA2ID] [uniqueidentifier] NULL,
[PayID] [uniqueidentifier] NULL,
[MAdID] [uniqueidentifier] NULL,
[ShID] [uniqueidentifier] NULL,
[SessID] [varchar](32) NULL,
[OrUID] [uniqueidentifier] NULL,
[PurOfUID] [uniqueidentifier] NULL,
[AddiUID] [uniqueidentifier] NULL,
[SupUID] [uniqueidentifier] NULL,
[PayTID] [uniqueidentifier] NULL,
[OPSID] [uniqueidentifier] NULL,
[StoRSID] [uniqueidentifier] NULL,
[PayMeID] [uniqueidentifier] NULL,
...
[OrderDate] [datetime] NULL,
[CreateDate] [datetime] NOT NULL,
[UpdateDate] [datetime] NULL,
[RowVersion] [timestamp] NOT NULL,
[Deleted] bit NOT NULL,
...
[CTR] int itentity(1,1) NOT NULL
CONSTRAINT [PK_Order] PRIMARY KEY NONCLUSTERED <= uses column [ID]
On this table exist 29 nonclustered indexes. I guess some of them have been created just in dispair…
I know that the design of this and other tables is not very well. It is way to wide and should be splitted. This is on my list.
The queries almost always filter on th column [OrID]. This column is not unique (highest no of rows with one single OrID is > 310 000).
All queries filter on DELETED = 0.
Also they are filtered on different numeric columns.
Joins are performed on the different uid columns.
The table holds FKs, also on the uid columns and has either parent tables, that are joined on one of the uid's and child tables that are joined using the PK [id].
After trying some of the traced user queries on this table I see that they usually filter one row and this is not performing very bad at all. Looking at the execution plans I see that one of the non clustered indexes is usually seeked, then a RID Lookup is performed. This costs about 30% – 70% of the query.
So I guess it could perform better. Also, having a clustered index in place, I assume several nonclustered indexes could be removed.
EDIT 10/16/15 additional information:
sp_BlitzIndex says: "22651751 forwarded fetches, 0 deletes against heap:dbo.tO (0) Self Loathing Indexes: Heaps with forwarded records or deletes"
End of Edit
Now my problem with this (and other samples) is that we have those ugly uniqueidentifier all over the place. I read lots of compareable posts but still have no clue how to handle that.
-
Should I just stop thinking and add a clustered index to the Primary Key [ID] and so follow the general suggestion that having a clustered index is basically better in 99.9% of all cases than having none, even if it does not exactly match the CI criteria (not narrow, not ever-increasing)?
-
Should I better use a column that is a little less unique but ever increasing like [CreateDate] ?
-
Should I add a new IDENTITY column and put the CI on this one?
The situation on other tables is compareable to that one.
EDIT Okt-17 2016: As requested in the comments, I added the (obfuscated) indexes that actually are in place. (I know some of them could easily be removed without any work for example [p] is 100% covered by [s]… but this is not the core of the heap/CI question).
CREATE UNIQUE NONCLUSTERED INDEX [a]
ON [dbo].[tO]([Deleted] ASC, [Status] ASC, [ID] ASC)
INCLUDE([NumericA], [CreateDate], [char3], [Bool1], [Bool2], [varchar1], [OrderDate], [INT1], [SuID], [varcharT], [UsID], [MaID], [varcharS], [TinyINTd], [PurOfUID], [AddiUID]) WITH (FILLFACTOR = 90);
GO
CREATE NONCLUSTERED INDEX [b]
ON [dbo].[tO]([ID] ASC, [UsID] ASC, [SuID] ASC, [Deleted] ASC, [Status] ASC, [OrID] ASC, [TINYINTm] ASC, [varchar1] ASC, [varcharT] ASC, [OrderDate] ASC, [CreateDate] ASC, [PAN_decimal] ASC, [char3] ASC, [INT1] ASC, [NumericA] ASC)
INCLUDE([Bool1], [Bool2]);
GO
CREATE NONCLUSTERED INDEX [c]
ON [dbo].[tO]([ID] ASC, [Status] ASC, [Deleted] ASC, [UsID] ASC, [SuID] ASC, [OrID] ASC, [TINYINTm] ASC, [varchar1] ASC, [varcharT] ASC, [OrderDate] ASC, [CreateDate] ASC)
INCLUDE([INT1], [char3], [PAN_decimal], [NumericA], [Bool1], [Bool2]);
GO
CREATE NONCLUSTERED INDEX [d]
ON [dbo].[tO]([MaID] ASC);
GO
CREATE NONCLUSTERED INDEX [e]
ON [dbo].[tO]([UsID] ASC);
GO
CREATE NONCLUSTERED INDEX [f]
ON [dbo].[tO]([SuID] ASC);
GO
CREATE NONCLUSTERED INDEX [g]
ON [dbo].[tO]([AddiUID] ASC, [OrID] ASC, [INT_CNR] ASC, [Deleted] ASC)
INCLUDE([ID]);
GO
CREATE NONCLUSTERED INDEX [h]
ON [dbo].[tO]([varchar1] ASC);
GO
CREATE NONCLUSTERED INDEX [i]
ON [dbo].[tO]([OrID] ASC, [INT_CNR] ASC, [Deleted] ASC)
INCLUDE([ID], [MaID], [md_Date], [ouID], [Status], [AddiUID], [PurOfUID], [varcharS], [UsID]);
GO
CREATE NONCLUSTERED INDEX [j]
ON [dbo].[tO]([OrderDate] ASC);
GO
CREATE NONCLUSTERED INDEX [k]
ON [dbo].[tO]([SCC_int] ASC);
GO
CREATE NONCLUSTERED INDEX [l]
ON [dbo].[tO]([PurOfUID] ASC);
GO
CREATE NONCLUSTERED INDEX [m]
ON [dbo].[tO]([OrID] ASC, [INT_CNR] ASC, [UsID] ASC, [Deleted] ASC)
INCLUDE([ID]);
GO
CREATE NONCLUSTERED INDEX [n]
ON [dbo].[tO]([Status] ASC)
INCLUDE([ID], [UsID], [SuID], [varchar1], [INT1], [char3], [OrderDate], [CreateDate], [Bool1]);
GO
CREATE NONCLUSTERED INDEX [o]
ON [dbo].[tO]([INT_CNR] ASC, [Status] ASC)
INCLUDE([ID], [OrID], [UsID], [SuID], [SessionID], [varchar1], [char3], [TinyINTd], [OrderDate], [CreateDate], [RowVersion], [Deleted]);
GO
CREATE NONCLUSTERED INDEX [p]
ON [dbo].[tO]([Deleted] ASC, [INT_CNR] ASC)
INCLUDE([ID]);
GO
CREATE NONCLUSTERED INDEX [q]
ON [dbo].[tO]([OrID] ASC, [Deleted] ASC)
INCLUDE([ID], [SuID], [varchar1], [ouID]);
GO
CREATE NONCLUSTERED INDEX [r]
ON [dbo].[tO]([OrID] ASC, [Status] ASC, [Deleted] ASC, [INT_CNR] ASC)
INCLUDE([ID], [UsID], [varchar1], [varcharT], [varcharS], [PurOfUID], [AddiUID]);
GO
CREATE NONCLUSTERED INDEX [s]
ON [dbo].[tO]([Deleted] ASC, [INT_CNR] ASC, [Status] ASC, [OrderDate] ASC)
INCLUDE([ID], [cfct_decimal]);
GO
CREATE NONCLUSTERED INDEX [t]
ON [dbo].[tO]([UsID] ASC, [INT_CNR] ASC, [Deleted] ASC, [Status] ASC, [OrderDate] ASC)
INCLUDE([NumericA], [cfct_decimal]);
GO
CREATE NONCLUSTERED INDEX [u]
ON [dbo].[tO]([SOID] ASC, [Status] ASC, [Deleted] ASC, [INT_CNR] ASC)
INCLUDE([ID], [MaID], [varchar1]);
GO
CREATE NONCLUSTERED INDEX [v]
ON [dbo].[tO]([Status] ASC, [Deleted] ASC, [ISD_Bool] ASC, [OrderDate] ASC)
INCLUDE([ID], [MaID], [varchar1]);
GO
CREATE NONCLUSTERED INDEX [w]
ON [dbo].[tO]([OrID] ASC, [INT_CNR] ASC, [Status] ASC, [Deleted] ASC, [ISD_Bool] ASC)
INCLUDE([ID], [MaID], [varchar1], [CreateDate], [OrderDate], [SuID], [UGRID], [ouID], [UsID], [DeA1ID], [md_Date], [MAdID], [PurOfUID]);
And here are very few and not prioritized sample queries (or excepers of those) that use this table:
...Select tO.ID From tO WHERE tO.[OrID] = @OrID AND [tO].[INT_CNR ] = 0 AND tO.[UsID] = @UserID AND tO.[Deleted] = 0 ...
... ID, [Deleted], [RowVersion] FROM [tO] LEFT JOIN O_details ON [tO].[ID] = [O_details].[OID] ...
WHERE tO.INT_CNR = 0 AND tO.RowVersion > @rv ...
... INNER JOIN [tO] ON [tO].ID = xyz.OID ...
...From O_Details d LEFT JOIN [tO] ON d.OID = [tO].ID
WHERE [tO].[Status] = 4
AND d.AID is not null
AND [tO].[ID] IN(...)...
... UPDATE [tO] SET [SomeVarcharCol]='...' WHERE [ID]=@id...
...UPDATE [tO] SET [NumericA]=@a, [PAN_decimal]=@b, [anydecimal]=@c, [UpdateDate]=@UpdateDate WHERE [ID]=@ID AND [RowVersion]=@RowVersion ...
END of Edit
Best Answer
Original Post
Honestly, in your situation I'd just go ahead and cluster the [ID]. It is worrying that there appears to be no default on it, but at the same time, maybe if you're lucky, it's being populated with some form of
NEWSEQUENTIALID
call. If not, it isn't the end of the world, < 1MM rows shouldn't be a show stopper if the indexes / stats are being maintained.Doing so will allow you to start focusing on reworking some filtered / include indexes and when you're done with that, maybe you'll have 4 sore spots instead of 12.
Any advice beyond that will probably result in "tip of the iceberg" classification for your problem - as you seem to be well aware, the question is really about abysmal design, rather than engine performance.
Update
I couldn't figure out a way to answer your comment clearly, so here's way too much information instead.
As it has been stated, if you have a thin, accessible, incremental columns, like an
IDENTITY
column, put a clustered index on that - it's a no-brainer.However:
If no such column exists and you're looking at heaps with 10+ indexes on them ( and -still- ending up doing RID lookups!! ), just cluster the
ID
for now, even if it's aUNIQUEIDENTIFIER
. You need, more than anything else, to clean up your storage utilization, and that's going to start with clustering -something-, if only to pull that forward fetch data back in line with rest of the row. Even if you immediately drop the clustered index afterwards, you'd at least have organized your heap for now, dramatically reducing forward pointers ( if not completely eliminating ) and ultimately reducing contention on your ( needlessly overworked ) I/O subsystem. There is literally no downside to doing this, so do it. Then you can go about figuring out why there 29 available indexes and queries still need to go RID matching.Ignoring the index sprawl and focusing on just clustering your heaps, using a
FILLFACTOR
around 70 should be fine for dealing withUNIQUEIDENTIFIER
values. This is because b-trees, the actual data structure SQL Server uses for indexes, tend towards around 69% node utilization underINSERT
only operation of random values. This number is a mathematical truth a couple of pretty bright fellas, Lehman and Yao, proved during their research into the subject. ThrowingDELETE
s in sufficient quantities into the mix can lower the node utilization target, but as a general rule, aFILLFACTOR
of 70 is a good place to start tuning non-sequential / random clustered indexes, since you're effectively telling SQL Server to not even bother trying to fill 30% of each page, since you won't be using that part anyway - because you won't be, mathematically.In regards to potential page splits using a non-optimal clustering key, while it is a valid concern, if you happen to have access to a Sql Server 2012+ installation, you can follow along with this demonstration, starting with setting up an extended event session to track page splits:
Once that's up and running, you can set up and seed a few tables to get an idea of the impact different kinds of clustering keys under differing fill factors can have on index density and the number of page splits inserts into them can cause. For the example, I'm using a good clustering example with
INTEGER
andIDENTITY
underFILL_FACTOR 100
, a terrible clustering example withUNIQUEIDENTIFIER
and non-sequentialNEWID()
underFILL_FACTOR 100
, and then three more non-sequentialUNIQUEIDENTIFIER
clustering keys underFILL_FACTOR
s of 75, 70 and 65, each seeded with 2.5MM records.For index density after the initial seeding, the following query is used:
And here are the density results ( notice the ~69% in the last column for the
UNIQUEIDENTIFIER
s ):As you might expect, the non-sequential indexes are very, very fragmented after 2.5MM inserts and no maintenance. Moving on to check the page splits that occurred by checking the extended event we set up earlier, we can use this query:
The results?
13 page splits for the good cluster, over 12500 for each of the non-sequential. This is bad in all non-sequential cases, but as originally mentioned, maintaining the indexes is an important part of making this work, so let's clean them up.
And now:
Okay, with fresh indexes, the page splits are now gone, fragmentation is largely the same for each of the tables and they're conforming to their
FILL_FACTOR
s. The non-sequential tables will put more pressure on I/O than the sequential one for size and density difference reasons, but will still perform acceptably. Now we can simulate some loads to these tables and see if page split related performance issues are likely to overcome the benefits of clustering the tables in the first place:And the results:
The only table affected by page splits after another 50000 rows were inserted was the non-sequential
UNIQUEIDENTIFIER
underFILL_FACTOR 100
. This absolutely could potentially cause performance issues. However, as shown, any of the other testedFILLFACTOR
levels caused no additional page splits. I actually needed to insert another 0.5MM into the tables before I started getting page splits on thedbo.PageSplitNewIDFillFactor75
table during my testing, and that one was over the mathematical threshold anyway.So, to reiterate, obviously, if a thin, sequential clustering key can be selected, it is the best choice. However, as I originally said, clustering a non-sequential key is a completely viable course of action, given no better option, if done responsibly. Your stated performance problems are not those of page splitting issues, they are forward fetch issues on heaps, which can be solved by adding a clustered index. So add one. If your heap has a non-clustered primary key that is a
UNIQUEIDENTIFIER
and that is being used throughout your software suite, cluster andFILL_FACTOR 70
that sucka, measure and adjust from there - the more time you spend putting out fires and explaining performance metrics, the less time you have to adjust the model to a more suitable form.Don't forget to drop your extended event session!