Primary key: (a) set it on a single column for two distinct data points or (b) set it on a combination of two columns, each for a separate data point

database-designnormalizationprimary-key

I am designing a database for an e-commerce company. Each product will be identified by a unique ID (SKU). We want to format the SKU as LLLL000. LLLL is the parent product, for example Iphone X.
000 is the variation product, for example color: Gold, storage: 256, status: Open Box.

Following the example, the SKU for Iphone X, color: Gold, storage: 256, status: Open Box, would be something like: LKGH177
A different Iphone X would have: LKGH178.

My questions are:

  • Should I consider the entire SKU as a single-column primary key that contains two distinct data points, and so violating 1NF?

  • Should I consider the SKU split in two columns and set a two-column composite primary key?

Database is not my main field so I hope it is clear.

Best Answer

I suggest doing both. Create an attribute for SKU and two separate attributes for the product and sub-type. SKU would be a key and the product sub-types would be a composite key. Example:

CREATE TABLE product
(sku CHAR(7) NOT NULL,
skusub1 CHAR(4) NOT NULL,
skusub2 CHAR(3) NOT NULL,
... 
UNIQUE (sku),
UNIQUE (skusub1,skusub2),
CHECK (skusub||skusub2=sku));

You can give the sub1 and sub2 attributes more meaningful names.

This is not actually a violation of 1NF. SKU is an important value in its own right and as long as you have the separate components in the database as well then there's no reason ever to split SKU into two parts. SKU is nearly always treated as a single value in my experience because although it may have encoded elements within it the encoding scheme is subject to variation. You may at some point have SKUs that don't follow the same scheme.

(note: in the above example I haven't identified a primary key because that's not relevant for this example)