I'm reconsidering my choice of Primary Key on a table, and would like some input.
Some context: We are building a REST API, where clients access "Assets" identified by a GUID. So, the table for Asset in our database resembles this:
CREATE TABLE [dbo].[Asset](
[Id] [uniqueidentifier] NOT NULL,
[TypeId] [int] NOT NULL,
[Date] [datetime] NOT NULL,
[Title] [varchar](1000) NULL,
[Description] [varchar](max) NULL,
[Created] [datetime] NOT NULL,
[Modified] [datetime] NOT NULL,
[PublisherFields] [xml] NULL,
[StatusId] [int] NOT NULL,
CONSTRAINT [PK_Asset] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
We are experiencing a few problems with this structure, notably that the clustered index on the table becomes fragmented very quickly, and we have now reached the point where we are no longer able to rebuild it (because of Azure limitations on query time limits).
Further research suggests that using a uniqueidentifier as a clustered index is not always a good choice, because of the way that newid() doesn't generate IDs in sequence (unless you use newsequentialid(), but Azure doesn't allow this either).
The simple solution seems to be to introduce a dummy column and add the clustered index to that. Design guidelines say that good clustered indexes should be sequential and non-changing, so the simplest answer to that seems to be an Identity column. This would make my table into this:
CREATE TABLE [dbo].[Asset](
[Id] [bigint] IDENTITY(1,1), <---- NEW
[AssetId] [uniqueidentifier] NOT NULL, <---- Renamed
[TypeId] [int] NOT NULL,
[Date] [datetime] NOT NULL,
[Title] [varchar](1000) NULL,
[Description] [varchar](max) NULL,
[Created] [datetime] NOT NULL,
[Modified] [datetime] NOT NULL,
[PublisherFields] [xml] NULL,
[StatusId] [int] NOT NULL,
CONSTRAINT [PK_Asset] PRIMARY KEY NONCLUSTERED <--- Now NonClustered
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
The clustered index would also be added to the ID
column. Does this seems a sensibible alternative? I might question the sense in having an IDENTITY
column on the table that is not a primary key, but my reasoning is that an IDENTITY
column is used to obtain the auto-incrementing nature, which suits the clustered index. It might also be confusing…when someone references an "AssetID", which column are they talking about? The [AssetId
] column, or the Asset table's [ID
] column?
Suggestions and alternatives welcomed.
Best Answer
There are a few things to consider here:
Also, please keep in mind:
UNIQUE
, a hidden "uniqueifier" field is added to rows that would otherwise be duplicates.NEWSEQUENTIALID()
is sequential per each restart of the SQL Server service. It is possible that the starting value after a restart is less than the previous lowest value.Hence:
UNIQUEIDENTIFIER
instead of anINT
as the FKed tables would then have a larger FK field in them.Created
field, plus add in the size of another field to make it unique, or the uniqueifier for any duplicate rows.INCLUDE
other columns. Meaning, if you have the Clustered Index on the INT PK and a Nonclustered index on the UNIQUEIDENTIFIER, then JOINing to another table on that INT PK field while specifying the GUID value in a WHERE clause (assuming no other fields from this table are in the query) won't have to go back to the table since the Nonclustered Index will have both of the requiered fields in it. Does theCreated
field give the same benefit? Likely not.IF no other tables ever JOIN to this table:
Created
field as the non-unique Clustered Index and theId
field as the Nonclustered PK.AssetId
, as the Clustered PK, and theId
UNIQUEIDENTIFIER field as a Nonclustered Index. Since you likely already have code referencing the UNIQUEIDENTIFIER field asId
, I wouldn't change that name.