Sql-server – Choice of clustered key when almost all columns would be part of a unique index anyway

clustered-indexindexsql-server-2008

I am creating the following table

CREATE TABLE dbo.COMM_SHELF_BLOCK_ACCESS
    (
    KeyId uniqueidentifier NOT NULL ROWGUIDCOL CONSTRAINT DF_COMM_SHELF_BLOCK_ACCESS_KeyId default NEWSEQUENTIALID(),
    EmployeeGuid uniqueidentifier NOT NULL,
    PracticeVid smallint NOT NULL,
    ShelfGuid uniqueidentifier NOT NULL,
    CONSTRAINT PK_COMM_SHELF_BLOCK_ACCESS PRIMARY KEY CLUSTERED (KeyId)
    )  ON [PRIMARY]
GO

CREATE UNIQUE NONCLUSTERED INDEX UX_COMM_SHELF_BLOCK_ACCESS 
    ON dbo.COMM_SHELF_BLOCK_ACCESS (ShelfGuid, EmployeeGuid, PracticeVid)

GO

I never use KeyId in my program, it is only for merge replication. The most common query my application will be using this table for will be

--This will be passed in parameters to the SP
declare @practiceVid smallint = 0
declare @employeeGuid uniqueidentifier = 'C413A410-E13E-4647-9D68-2A38FBC45906'

--actual query
select * from COMM_SHELF
where COMM_SHELF.SHELF_GUID not in 
    (
        select ShelfGuid 
        from COMM_SHELF_BLOCK_ACCESS 
        where EmployeeGuid = @employeeGuid 
            and PracticeVid = @practiceVid
    )

Is the index structure of my new table appropriate? The big thing on my mind is I am duplicating almost my entire table in that unique index. The other thought I had doing was to structure the table like the following

CREATE TABLE dbo.COMM_SHELF_BLOCK_ACCESS
    (
    KeyId uniqueidentifier NOT NULL ROWGUIDCOL CONSTRAINT DF_COMM_SHELF_BLOCK_ACCESS_KeyId default NEWSEQUENTIALID(),
    EmployeeGuid uniqueidentifier NOT NULL,
    PracticeVid smallint NOT NULL,
    ShelfGuid uniqueidentifier NOT NULL,
    CONSTRAINT PK_COMM_SHELF_BLOCK_ACCESS PRIMARY KEY NONCLUSTERED (KeyId),
    CONSTRAINT CK_COMM_SHELF_BLOCK_ACCESS UNIQUE CLUSTERED (ShelfGuid, EmployeeGuid, PracticeVid)
    )  ON [PRIMARY]
GO

making those 3 columns I am interested in the clustered key. However this violates the "Keep the clustered key narrow" and the "Keep the clustered key sequential" guidelines I have read everywhere.

Which of these approaches is the correct way to implement this? If neither are correct please tell me what the correct way to approach this is.

Best Answer

The big reasons for the "Keep the clustered key sequential" advice is all around insert/update/delete operations and fragmentation. As you make changes to the values (insert new ones out of order, updating an old value that causes it to move, deleting a value) SQL has to move data around on the pages and do page-splits to keep the data in order (the whole purpose of a clustered index). This is going to take time and create fragmentation which will slow everything down over time.

If you are doing very few updates (including inserts/deletes) or rebuild your table every time the updates are done (truncate & then batch load) then a clustered index on values that aren't naturally in order can work for you. Otherwise stick with your sequential value.

As far as your index goes, based on just that one query, I would prefer this:

CREATE INDEX ix1 ON COMM_SHELF_BLOCK_ACCESS(EmployeeGuid, PracticeVid) INCLUDE (ShelfGuid)

You don't want the ShelfGuid up front because then it has to do an index scan to find the EmployeeGuid that it wants. If you aren't actually going to be searching on ShelfGuid and just want to return the value then take it out of the tree level of the index entirely and just put it in the leaf level (by INCLUDING it).

If you have the room for the index (which I'm guessing you do) then don't worry about it being almost the same size as the table. It's still a index that sorts your data in a specific order. I'm guessing your goal here is to speed up your query not to conserve space. If you are tight on space and can't afford more then you have a different issue and I would consider redoing your table with something other than GUIDs :).

Best Answer

Related Solutions

Sql-server – Why is Clustered Index on Primary Key compulsory

Sql-server – Convert clustered primary key to non-clustered and use another column for the clustered index

Related Question