Sql-server – Design of small tables: clustered index or heap

database-designindexsql server

I'm trying to design a database schema to hold an entity that has a couple of one-to-many associated collections.

The challenge here is that the related objects have few properties, and so the tables are "small" (at least in terms of amount of columns). This makes it difficult to make a decision on two fronts:

  1. Do I have a primary key or not
  2. Do I create a clustered index, or make it a heap + have a covering non-clustered index

This is my situation (note – actual entity is replaced with a generic example):

-----------------------             --------------------------
|         User        |             |      UserSituation     |
-----------------------         =>  --------------------------
| + UserId [NOT NULL] |             | + UserId [NOT NULL]    |
-----------------------             | + StartDate [NOT NULL] |
                                    | + EndDate [NULL]       |
                                    | + IsActive [NOT NULL]  |
                                    --------------------------

                                    --------------------------
                                    |      UserCategory      | 
                                =>  --------------------------
                                    | + UserId [NOT NULL]    |
                                    | + StartDate [NOT NULL] |
                                    | + EndDate [NULL]       |
                                    | + Category [NOT NULL]  |
                                    --------------------------

Again, I've generalised the example, please don't try to give me advice to re-design the database structure by trying to derive any business meaning behind the fields.

To design the tables, I've reasoned about the following, which is valid for both tables:

  • A primary key cannot exist because it needs to cover all columns and EndDate is nullable. Still, if that was not the case, would it make sense to have a primary key which is so wide?
  • A covering non-clustered index can use the RID to lookup table data. This will make leaf nodes smaller, because the alternative is having "UserId + Uniquefier" in all entries. How much does that matter?
  • A clustered index on UserId -> this means a Uniquefier will be created, so I can't seem to decide whether that would be good or bad.
  • In any case I have a UNIQUE NONCLUSTERED index over all columns.

I've read Kimberly Tripp's articles on the relevant topics but also stumbled across Marcus Winand's blog post about clustered indexes. He has a rather opposing vision about clustered index than the majority of people, but he seems to make sense all the same.

Other notes:

  • The data in the tables will not be updated, only deleted/inserted because my app treats the collection entries as immutable.
  • I'm using SQL Server, and need to support 2008 R2 and up.
  • The sub-tables will not really be queried, mostly joined to the main entity.
  • Having read a substantial amount of information on indexes and table design, both here and on blogs, I can't decide on this particular scenario.

Best Answer

As for subtables, I have a rule of always adding a primary key. In this case I would go with a new autoincrement column as a PK, as the index will be pretty narrow.

Secondly, I would consider propagating information like IsActive from UserSituation to User with a trigger (I assume user can only have one situation at the given moment). This may eliminate the need for one JOIN in most cases. Not sure it will work with UserCategory (it will not if it's a true 1:N relation).

As for clustered index on UserId, it sounds like a good idea on table User and a not-so-good idea on subtables. But I have to admit it is not my field of expertise.