Technical Challenges of Composite Keys in SQL Server

database-designsql server

I'm sure this question has been asked before, but I'd rather put a twist on it to keep it not opinion based, and as informative as possible. (Also please forgive any mistakes I make in annotation, I'm still fairly new to this and want to get better.)

Say I'm creating a database (SQL Server, a RDBMS I'm not familiar with) with a set of users, and for the sake of simplicity, I'm just keeping track of some APIs that these users use. Say I have API's Alpha, Beta, Gamma, and Delta. Each of these have keyed access. I need to associate the keys with the users. My gut instinct would be to split the data as such in schema A:

,-------,         ,----------,
|*Users*|         |*APIkeys* |
|-------| 1       |----------|
| id^   |----\    | key^     |
| fname |    |    | service^ | 
| lname |  <Has>  | queries  |
| email |    |    | ...      |
| ...   |    \--<=| userid   |
| ...   |       n | ...      |
|_______|         |__________|   ( where ^ indicates primary key )

In this schema, the APIs table has a composite primary key, which makes sense from a high level perspective. I'm keeping track of APIs and Users. Each API has a unique key to its service, which seems reasonable to turn into a composite key.

There is alternative schema B, which is where instead of having one APIkeys table, I have 4 separate keys table, one for Alpha, Beta, Gamma, and Delta. This leads to a lot more tables, but avoids having the composite keys.

However, after thinking some more about the situation, I also have an alternate relational schema, Schema C:

,-------,        ,----------,        ,----------,
|*Users*|        |*UtoArel* |        |*APIkeys* |
|-------| 1    n |----------|      1 |----------|
| id^   |------<=| userID^  | n  /---| keyID^   |
| fname |        | keyID^   |=>-/    | service  |
| lname |        |__________|        | queries  |
| email |                            | queries  |
| ...   |                            | ...      |
|_______|                            |__________|    ( where ^ indicates primary key )

So now my questions are these (also, please keep in mind I'm using SQL Server, but I'd also like to know "in general" as well):

  1. What are the advantages of schema A over B? A over C?

  2. Are there any pitfalls of using composite keys? (It would be great if you can tie it in with this scenario too.)

  3. Are there any other schemas that I missed that would be better?

And of course, any other advice would be welcome, I'm happy to learn.

Best Answer

The question conflates composite keys with schema design. The two are separate.

All keys serve the same purpose. Their value(s) uniquely identify a single instance of the type to which they refer. This is true whether the key is single or composite, natural or surrogate.

It can be that multiple keys are identified for one table. Collectively these are known as candidate keys. Some of these likely will be natural keys, identified during logical analysis phases of the project. Some candidate keys will be surrogate (or composites of surrogates, or surrogates and natural keys) likely created during physical modelling for performance or technical reasons.

Having identified entities and their candidate keys and normalised them we can consider how the entities relate to each other. Here you can do yourself a big favour by avoiding wishy-washy names like 'has'. What is the real-world business term for the relationship between a single instance of User and a single instance of API? Complete these sentences:

Each User <relationship> [one | one or more] API.  
Each API <relationship> [one | one or more] User.

If the relationship is "administers" then likely its many-to-many and schema C is appropriate. If the relationship is "logs in using" I would hope each user has their own key so schema A may be appropriate. It may be in the system we're writing we want to track both administration and credentials so parts of both A and C are needed. My point being it is the business rules and the cardinality that determines table layout. Key composition plays no part.

Now we can come to the physical design of tables. Here's where composite keys matter at last. In a relational database tables are linked by foreign keys. The column(s) that make up a key of one table are included in the other table*. It follows that if we choose to use a candidate key that is a composite of many long columns there will be an overhead compared to using another candidate which is, say, a single integer. More disk space will be used, more working memory will be required for queries, join conditions will be more complicated and buggier etc. Conversely, if we insist on using only integer surrogate keys there must be continuous conversion between these internal representations, in-coming search terms, and out-going results all of which add load to the system. The balance between these factors is as much art as it is science.

In SQL Server specifically each table either has a clustered index or it does not. There are several considerations which strongly mitigate toward having nearly all tables as clustered. This decided there are then good reasons to have the clustering key unique and as small as possible i.e. an integer surrogate key. This then naturally leads to the same key being used as the foreign key in all relationships. This, unfortunately, becomes reflex after a while and every table has an integer primary key clustering index, usually IDENTITY() for good measure.

There no requirement that every foreign key reference to a table must reference the same candidate key. However, every candidate thus referenced must have a unique constraint. The index behind these constraints have write overhead so it is usual that the same parent candidate key is used as a foreign key every child table.

In conclusion, I would say there is no advantage of schema A over B or C. Each either implements the requirements of the task at hand and is correct or it does not and it is wrong. If it implements them efficiently for the DBMS in use then so much the better.

* Colloquially called "parent" and "child" the proper terms are "referenced" and "referencing".