When to break down attributes in tables into sub-tables for normalisation

database-design

I am currently developing a data model. This model contains an entity "Asset", which has many numerical attributes, volume, quantity, price, etc…

I also have string attributes that are part of the Asset table. Say one of those is Asset Type, this can be either a currency forward, stock, fund, index, etc… This is a good representation of the general cardinality of these string attributes, they have at most 10-15 different entries. The cardinality for the Asset table though, is in the tens of thousands.

Should I create a separate table for my Asset Type attribute and reference it using a foreign key in my Asset table, or is it simpler to leave the Asset table un-normalised and store these attributes directly, as strings, in the Asset table?

Best Answer

Let's say you create the table and give it two columns - AssetTypeId and AssetTypeName. Both will be unique by definition so both will be candidate keys. This table is normalized.

Since both columns are a candidate key whichever of these is used in Asset it will still be normalized.

One consideration may be on-disk data size. With only about a dozen asset types a single byte would suffice for AssetTypeId whereas AssetTypeName may require 20 or more. With "tens of thousands" of assets the saving would be small (100kb or so). With lots of columns this may add up, however, so its worth thinking about. If you are using columnar storage or data compression likely this saving is already happening under the covers within the DBMS.

The flip side to having a separate tale is if you need to return the asset type name to the client an additional join will be required. You know your workload best. You must determine where the balance of benefits lies with your application. Neither way is "wrong".