Foreign key as part of composite primary key or not

database-designforeign keyprimary-key

I'm designing a relational database. This is what I'm trying to design:

I will have production orders and products.

A production order will have one product, but a product could be on zero or more production orders (In fact, there will be always at least one production order for each product. It has no sense to have a product that it's not going to be produce on database).

In other words, with a production I will produce a product, but I will have a lot of production orders to produce the same product.

We are going to distinguish each production order by its name (alphanumeric) and it must be unique.

I have two questions:

Is the production order's name its primary key? I think I can use a integer as primary key.
Does product's key must be part of production order's primary key? I don't think so but I'm not sure.

By the way, I'm going to use SQL Server.

Best Answer

It depends on the specific scenario.

For example, if you have a many-to-many relationship, then it is common to have a join table between the two tables. The join table will contain a foreign key to each of the primary keys and most of the time, the combination of the two foreign keys will become the primary key:

CREATE TABLE users 
(
    id INT NOT NULL PRIMARY KEY
);
CREATE TABLE roles 
(
    id INT NOT NULL PRIMARY KEY
);
CREATE TABLE users_roles 
(
    user_id INT NOT NULL REFERENCES users(id),
    role_id INT NOT NULL REFERENCES roles(id),
    PRIMARY KEY (user_id, role_id)
);

For the most part, I prefer natural keys rather than surrogate keys - especially if I will never use the surrogate key to ever query the table.

With that being said, to answer your question directly:

Firstly, can a string be a primary key? Though the name may be its natural key, you have to take into consideration the size of the key if you made it the primary key. Not just in your main table, but also in the tables that refer to it. Remember, that in order to create a foreign key you must copy the key value into any dependant tables - this can quickly bloat your database and slow down queries depending on the size and type of data. For instance, short fixed-length text columns can perform better than variable-length text columns and can be indexed easier than longer text values. A good example of this that I use each day is a currency table. My primary key is a 3 character currency code - it is short, of a fixed length, and I am always going to query the table using it even if the table had a surrogate key defined on it.

CREATE TABLE DimCurrency
(
    Code CHAR(3) NOT NULL PRIMARY KEY,
    Name VARCHAR(50) NOT NULL
);
CREATE TABLE FactSales
(
    CurrencyCode CHAR(3) NOT NULL REFERENCES DimCurrency(Code),
    Date DATE NOT NULL,
    SalesAmmount DECIMAL(18,2) NOT NULL
);

Secondly, does the product id need to be part of the productions primary key? Well, this depends on a few factors. If you made the name part of the primary key, and the name was short then potentially, yes it could. But, does any other table have a reference to the productions table? If so, then by adding the product id into the tables primary key you would also need to add it into all of the foreign keys that refer to it. Personally, I don't like this. I am happy to do it if I can guarantee that no table will ever need a foreign key to it (such as a fact table in a dimensional model) but otherwise I think it is wasteful and makes querying your tables over complicated.

You should also consider that if you define further non-clustered indexes on the production table when you have a composite key, the row locator for the non-clustered index is a copy of the primary key itself. This can cause all of your non-clustered indexes to bloat out of control very quickly without your realising it. For reference: Clustered and Nonclustered Indexes Described

I think given your particular scenario, it is perfectly reasonable to add a foreign key in your production table which refers to the id of the product that is produced. As Gulrez Khan stated in his answer, you may want to add a non clustered index on the foreign key - it is not required, but can enhance performance in some queries.

CREATE TABLE production
(
    id INT NOT NULL PRIMARY KEY,
    product_id INT NOT NULL REFERENCES product(id),
    name VARCHAR(50) NOT NULL UNQIUE
);

CREATE INDEX ix_production_product_ncl ON production(product_id);

Remember that all indexes need to be built, and rebuilt when they are modified. The more fields you cram into them, the longer it takes to perform actions against them. Especially clustered indexes, which are generally defined upon the primary key. These indexes physically sort your data at the storage level. If you insert a value that happens to not be at the end of the existing chain of values, you could cause performance problems due to page splits. For me, this is the biggest advantage of a surrogate key over a natural key - the ability to ensure that new data is always appended and not injected into the middle of an existing index.

Finally, if in doubt - try it out! There is no substitution for a good testing strategy. Try querying the data against both structures, look at the query plans, do some capacity planning and choose the one that is right for you.

Related Solutions

File name as primary key

The PRIMARY KEY, or any index for that matter, would be accessed much faster if the length of the PRIMARY KEY was smaller. It is easier to put a 4-byte integer as a unique identified for a fullpath image name than the fullpath filename (of various and ridiculous lengths).

Think of the Clustered Index, where the PRIMARY KEY would reside. Row data will occupy a Clustered Index. In MySQL, the Clustered Key would be coupled with other columns in a nonunique index. Wouldn't a smaller datatype (4 bytes) just make more sense? Otherwise, indexes can blow up at a rate of O(n log n).

To create a unique number for each image, you need a table that resembles something like this:

CREATE TABLE images
(
    image_id INT NOT NULL AUTO_INCREMENT,
    image_name VARCHAR(255),
    image_folder VARCHAR(255),
    PRIMARY KEY (image_id),
    UNIQUE KEY name_folder (image_name,image_folder),
    KEY folder (image_folder)
);

This design

allows for multiple files with the same image name, each located in a different folder
denies having two files with the same image name in the same folder

From here, just INSERT and retrieve the created image_id as follows:

INSERT IGNORE INTO images (image_name,image_folder)
VALUES ('someimage.jpg','/some/linux/folder');
SELECT image_id INTO @imageid FROM images
WHERE image_name='someimage.jpg'
AND image_folder='/some/linux/folder';

Doing it this way may lose some image_id numbers along the way. You may want to do this:

SELECT COUNT(1) INTO @image_is_there FROM images
WHERE image_name='someimage.jpg'
AND image_folder='/some/linux/folder';

IF @image_is_there IS 0, then

INSERT INTO images (image_name,image_folder)
VALUES ('someimage.jpg','/some/linux/folder');

Sql-server – Impact of miss-ordered fields in a composite, clustered primary key

The order of columns in a primary key does not affect the insert performance, since the combination of the values in the two columns of the primary key are meant to be unique.

However the order of index impacts the performance of SELECT queries. In the new order you will have a greater and increasing range of values (dates) across which the index is clustered (dates) as compared to clustering around 100 stores.

If most of your queries are focused on a single store with the new ordering your queries will be slower.

Best Answer

Related Solutions

File name as primary key

Sql-server – Impact of miss-ordered fields in a composite, clustered primary key

Related Question