Sql-server – Do I need a separate Id column for this “mapping” table

database-designreferential-integritysql server

I have a table of Producers and a table of Products, both of which are of the form:

Id – int, Primary key
Name – nvarchar

A Producer can carry multiple Products, so I was going to create a table called ProducerDetails that would have:

ProducerId – int, Foreign key to Producers.Id
ProductId – int, Foreign key to Products.Id

Then I started to question myself, so I thought I'd ask the experts. Would it be better database design to have an additional Id (int, Primary key) column in my ProducerDetails table? Or is that unnecessary?

I'm using SQL-Server 2008 R2 if that make any difference at all.

EDIT – The relationship between these tables would be many-to-many I believe, sorry I didn't make that clear. A producer can carry multiple types of products, and the same product could be produced by multiple different producers.

_{I apologize if this question is overly simple, referential integrity / database design is not my strongsuit (although I'm trying to improve that).}

Best Answer

If you have a one-to-many relationship between Producers and Products (in other words, a product can only belong to one producer), than it would make sense to just put a foreign key reference directly in your Products table:

One-To-Many

create table Producer ( id int identity(1, 1) not null primary key clustered, Name varchar(100) not null ) go create table Product ( id int identity(1, 1) not null, Name varchar(100) not null, ProducerId int not null foreign key references Producer(id) ) go

But if there's a chance that this will be a many-to-many relationship, then your best bet would be to use a Join table.

Many-To-Many

create table Producer
(
    id int identity(1, 1) not null primary key clustered,
    Name varchar(100) not null
)
go

create table Product
(
    id int identity(1, 1) not null primary key clustered,
    Name varchar(100) not null
)
go

create table ProductProducer
(
    ProductId int not null foreign key references Product(id),
    ProducerId int not null foreign key references Producer(id)
)
go

-- adding the primary key also ensures uniqueness
alter table ProductProducer
add constraint PK_ProductProducer 
primary key (ProductId, ProducerId)
go

If you decide to go with the Join table, you wouldn't need to have an additional key, as the combination of ProductId/ProducerId would ultimately be unique. You could use them as a composite key, so you wouldn't need that additional Id field in ProductProducer.

Related Solutions

Sql-server – Indexing – Uniqueidentifier Foreign Key or Intermediary mapping table

Ok, I am making a lot of assumptions (INT instead of VARCHAR(50) being one of them) with this answer, so feel free to correct me if needed. The problem with option B is that it introduces a new join to relate Users to Alerts without any real added benefit. If joining on the UserID, it is best to index the UserID, so you can utilize seeks for your joins.

For Option A, UserID will be the clustering key (index key for the clustered index) on the Users table. UserID will be a nonclustered index key on Alerts table. This will cost 16 bytes per Alert.

For Option B, UserID will be the clustering key on the Users table. UserId will probably be the clustering key in UserMap too, to make joining more efficient. UserKey (assuming this is an INT) would then be a nonclustered index key on the Alerts table. This will cost 4 bytes per Alert. And 20 bytes per UserMap.

Looking at the big picture, one relationship, for Option A, costs 16 bytes of storage, and involves 1 join operation. Whereas, one relationship, for Option B, costs 24 bytes of storage, and involves 2 join operations.

Furthermore, there are a possibility of 340,282,366,920,938,000,000,000,000,000,000,000,000 uniqueidentifiers and only 4,294,967,296 INTs. Implementing a uniqueidentifier to INT map for a this type of relationship could cause unexpected results when you start reusing INTs.

The only reason for creating this type map table, is if you plan on creating a Many to Many relationship between Users and Alerts.

Taking all of this into consideration, I would recommend Option A.

I hope this helps,

Matt

Sql-server – SQL Server Primary key / clustered index design decision

You are correct to separate "clustered index" from "primary key":

A clustered index is the organisation of data on disk is better if
- narrow
- numeric
- increasing (strictly monotonic)
The primary key identifies a row

Note: GUIDs make poor clustering keys

In this case, with the surrogate column, the table has 2 candidate keys:

ProductHistoryID
ProductNo + CreatedDateTime

Assumed convention states that the ProductHistoryID becomes the PK, but you can leave the PK on (ProductNo, CreatedDateTime): it will just be non-clustered. Which leads to indexes:

clustered index should be on ProductHistoryID
unique non-clustered index on (ProductNo, CreatedDateTime)

Example

CREATE TABLE Product (
    ProductHistoryID int NOT NULL IDENTITY (1,1) NOT NULL,
    ProductNo ...
    CreatedDateTime ...

then you a choice of

    CONSTRAINT PK_Product PRIMARY KEY CLUSTERED (ProductHistoryID)
    CONSTRAINT UQ_Product UNIQUE NONCLUSTERED (ProductHistoryID)

    CONSTRAINT PK_Product PRIMARY KEY NONCLUSTERED (ProductNo, CreatedDateTime)
    CONSTRAINT PK_Product UNIQUE CLUSTERED (ProductHistoryID)

Also, the pattern you have is a "type 2 Slowly Changing Dimension"

Best Answer

Related Solutions

Sql-server – Indexing – Uniqueidentifier Foreign Key or Intermediary mapping table

Sql-server – SQL Server Primary key / clustered index design decision

Related Question