Sql-server – Design of small tables: clustered index or heap

database-designindexsql server

I'm trying to design a database schema to hold an entity that has a couple of one-to-many associated collections.

The challenge here is that the related objects have few properties, and so the tables are "small" (at least in terms of amount of columns). This makes it difficult to make a decision on two fronts:

Do I have a primary key or not
Do I create a clustered index, or make it a heap + have a covering non-clustered index

This is my situation (note – actual entity is replaced with a generic example):

-----------------------             --------------------------
|         User        |             |      UserSituation     |
-----------------------         =>  --------------------------
| + UserId [NOT NULL] |             | + UserId [NOT NULL]    |
-----------------------             | + StartDate [NOT NULL] |
                                    | + EndDate [NULL]       |
                                    | + IsActive [NOT NULL]  |
                                    --------------------------

                                    --------------------------
                                    |      UserCategory      | 
                                =>  --------------------------
                                    | + UserId [NOT NULL]    |
                                    | + StartDate [NOT NULL] |
                                    | + EndDate [NULL]       |
                                    | + Category [NOT NULL]  |
                                    --------------------------

Again, I've generalised the example, please don't try to give me advice to re-design the database structure by trying to derive any business meaning behind the fields.

To design the tables, I've reasoned about the following, which is valid for both tables:

A primary key cannot exist because it needs to cover all columns and EndDate is nullable. Still, if that was not the case, would it make sense to have a primary key which is so wide?
A covering non-clustered index can use the RID to lookup table data. This will make leaf nodes smaller, because the alternative is having "UserId + Uniquefier" in all entries. How much does that matter?
A clustered index on UserId -> this means a Uniquefier will be created, so I can't seem to decide whether that would be good or bad.
In any case I have a UNIQUE NONCLUSTERED index over all columns.

I've read Kimberly Tripp's articles on the relevant topics but also stumbled across Marcus Winand's blog post about clustered indexes. He has a rather opposing vision about clustered index than the majority of people, but he seems to make sense all the same.

Other notes:

The data in the tables will not be updated, only deleted/inserted because my app treats the collection entries as immutable.
I'm using SQL Server, and need to support 2008 R2 and up.
The sub-tables will not really be queried, mostly joined to the main entity.
Having read a substantial amount of information on indexes and table design, both here and on blogs, I can't decide on this particular scenario.

Best Answer

As for subtables, I have a rule of always adding a primary key. In this case I would go with a new autoincrement column as a PK, as the index will be pretty narrow.

Secondly, I would consider propagating information like IsActive from UserSituation to User with a trigger (I assume user can only have one situation at the given moment). This may eliminate the need for one JOIN in most cases. Not sure it will work with UserCategory (it will not if it's a true 1:N relation).

As for clustered index on UserId, it sounds like a good idea on table User and a not-so-good idea on subtables. But I have to admit it is not my field of expertise.

Related Solutions

Sql-server – SQL Server Primary key / clustered index design decision

You are correct to separate "clustered index" from "primary key":

A clustered index is the organisation of data on disk is better if
- narrow
- numeric
- increasing (strictly monotonic)
The primary key identifies a row

Note: GUIDs make poor clustering keys

In this case, with the surrogate column, the table has 2 candidate keys:

ProductHistoryID
ProductNo + CreatedDateTime

Assumed convention states that the ProductHistoryID becomes the PK, but you can leave the PK on (ProductNo, CreatedDateTime): it will just be non-clustered. Which leads to indexes:

clustered index should be on ProductHistoryID
unique non-clustered index on (ProductNo, CreatedDateTime)

Example

CREATE TABLE Product (
    ProductHistoryID int NOT NULL IDENTITY (1,1) NOT NULL,
    ProductNo ...
    CreatedDateTime ...

then you a choice of

    CONSTRAINT PK_Product PRIMARY KEY CLUSTERED (ProductHistoryID)
    CONSTRAINT UQ_Product UNIQUE NONCLUSTERED (ProductHistoryID)

    CONSTRAINT PK_Product PRIMARY KEY NONCLUSTERED (ProductNo, CreatedDateTime)
    CONSTRAINT PK_Product UNIQUE CLUSTERED (ProductHistoryID)

Also, the pattern you have is a "type 2 Slowly Changing Dimension"

Sql-server – Will a primary key be added as a clustered index

Yes, SQL Server will create the primary key as clustered by default, but you don't have to accept the defaults.

ALTER TABLE dbo.foo 
  ADD CONSTRAINT pk PRIMARY KEY (bar); -- clustered

ALTER TABLE dbo.foo 
  ADD CONSTRAINT pk PRIMARY KEY CLUSTERED (bar); -- clustered

ALTER TABLE dbo.foo 
  ADD CONSTRAINT pk PRIMARY KEY NONCLUSTERED (bar); -- non-clustered

And yes, you will see some I/O activity here, so if it is a busy system, best to save this for quieter hours or a maintenance period.

Best Answer

Related Solutions

Sql-server – SQL Server Primary key / clustered index design decision

Sql-server – Will a primary key be added as a clustered index

Related Question