When to Add Another Lookup Table for the Same Field

database-design

Example

Table Recipe uses a lookup table for Recipe Category;
Table Ingredient uses a lookup table for Ingredient Category;
They are in a Many-to-Many relationship;
A Recipe Category will not be used as a Ingredient Category or vice-versa;

Question

Should it be separate lookup tables, or can I use one table with an extra column to identify if it's an Ingredient or Recipe Category? Does this affect optimization or query speed?

Roughly 100 Records if a single table; separated tables it's 5 and 95. The table will appear more than two times in the same query.

I tested the ideas in the related Q & A Relational tables with same content, custom solution. It's easier to handle data on the frontend side and housekeeping, but data integrity could be an issue if the database grows.

Clarifications

A recipe can be classified by more than one recipe category
An ingredient can be classified by more than one ingredient category
There are 5 recipe categories in the business domain
There are 95 ingredient categories in the business domain
Three examples of concrete recipe categories are Omivorous, Vegetarian, Vegan.
Three examples of concrete ingredient categories are Cereal, Meat, Fish.

I've been testing around with both solutions but with so few records I don't see any improvement or loss of performance. Also when building the front end, it's easier to work with one model and a column than 2 models but it's a minor issue.

Related Q & A on Stack Overflow: One lookup table or many lookup tables?

Best Answer

Although RecipeCategory and IngredientCategory have very similar names and attributes, they are in fact two different entity types, because each of them (a) carries a specific business domain meaning, (b) has distinct kinds of relationships and (c) entails a particular set of rules.

In this regard, if the intention is to implement a relational database (RDB), it is quite helpful to perform an analysis of the business domain of interest (in order to construct a conceptual model) in terms of entity types (i.e., types or prototypes of entity occurrences), their attributes and interrelationships before thinking in terms of tables, columns and constraints (points that correspond to the logical model). Proceeding in this fashion, it is much easier to capture the meaning of the business domain with accuracy and then reflect it in an actual RDB structure.

Business domain rules

Recipe and RecipeCategory

Let us start working with two entity types: Recipe and RecipeCategory. In the scenario in question, it appears reasonable to affirm that:

A Recipe is classified by zero-one-or-many RecipeCategories
A RecipeCategory classifies zero-one-or-many Recipes

Such situation indicates, yes, that Recipe and RecipeCategory are involved in a many-to-many (M:N) relationship, which implies the existence of an associative entity type, that I am going to call RecipeCategorization.

Ingredient and IngredientCategory

Then, let us deal with Ingredient and IngredientCategory. In this case, we can affirm that:

An Ingredient is grouped by zero-one-or-many IngredientCategories
An IngredientCategory groups zero-one-or-many Ingredients

This means that Recipe and RecipeCategory are connected in another M:N relationship, which entails the existence of another associative entity type, that I denominated IngredientCategorization.

RecipeCategory and IngredientCategory

As discussed above, one can observe that the concrete occurrences of RecipeCategory are meant to be (directly) associated with the specific instances of Recipe, and not with the occurrences of Ingredient. In the same manner, the concrete instances of IngredientCategory are meant to be (directly) connected with the specific occurrences of Ingredient, and not with the instances of Recipe. Therefore, RecipeCategory and IngredientCategory are distinct entity types, and demand their own respective individual considerations.

Recipe and Ingredient

Finally, we can assume that:

A Recipe includes one-to-many Ingredients
An Ingredient is included in zero-one-or-many Recipes

Thus, there is another M:N relationship, this time between Recipe and Ingredient, which reveals the existence of other associative entity type, that I am going to entitle RecipeListing.

Illustrative IDEF1X model

Then, from the aforementioned analysis and consequent formulations, I created the IDEF1X^† model shown in Figure 1:

Figure 1 - Recipes Simplified IDEF1X Model

As demonstrated, each entity type is depicted in its corresponding individual box, and is displayed directly related to (i) its own attributes, contained in the respective box, and to (ii) the entity types that apply, by way of the relationship lines.

Of course, there are other indirect relationships that should be derived via the direct connections exposed here.

Logical and physical elements

Once we have analyzed and defined the pertinent types of the things of significance, it is time to determine how to manage them by means of mathematical relations (declared and visualized as tables, if created on a certain SQL database management system), which are composed of domains (portrayed as columns) and tuples (pictured as rows).

As relations are abstract resources, Dr. E. F. Codd —the originator of the relational paradigm— envisioned the utility of representing them in tabular form, so that, e.g., the users and implementers of a RDB can approach them in a more familiar way. In this respect, even though a relational table has a concrete shape, it is still a logical element of a given database, and its components, e.g., columns, rows and constraints are logical as well.

In this regard, it is very important and of vast pragmatical value to distinguish logical from physical elements. For instance, in file systems, a physical record can be made up of zero, one or more fields. In the case of a RDB, the logical elements can be served by one or more physical units (at a lower level of abstraction, then), e.g., indexes, records, pages, extents, etc.

Thus, in accordance with the points detailed above, a table —being a logical level component— does not have fields (which may well be part of the underlying concrete scaffoldings supporting a table declaration, but work at the physical level).

Expository logical SQL-DDL structure

That being said, and based on the IDEF1X model previously presented, both RecipeCategory and IngredientCategory (and the rest of the identified entity types too) require an individual base table that stands for each of them, as exemplified in the following DDL structure:

-- You have to determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on your business context characteristics.

-- Also, you should make accurate tests to define the
-- most convenient physical implementation settings; e.g.,
-- a good INDEXing strategy based on query tendencies.

-- As one would expect, you are free to make use of 
-- your preferred (or required) naming conventions. 

CREATE TABLE RecipeCategory ( -- Plays a ‘look-up’ role.
    RecipeCategoryCode CHAR(2)  NOT NULL,  -- This column can retain the values: ‘O’ for ‘Omnivorous’; ‘VT’ for ‘Vegetarian’; ‘VG’ for ‘Vegan’; etc.
    Name               CHAR(30) NOT NULL,
    Description        CHAR(60) NOT NULL,
    Etcetera           CHAR(30) NOT NULL,
    CreatedDateTime    DATETIME NOT NULL,
    CONSTRAINT RecipeCategory_PK  PRIMARY KEY (RecipeCategoryCode),
    CONSTRAINT RecipeCategory_AK1 UNIQUE      (Name),       -- ALTERNATE KEY.
    CONSTRAINT RecipeCategory_AK2 UNIQUE      (Description) -- ALTERNATE KEY.      
);

CREATE TABLE Recipe (
    RecipeNumber    INT      NOT NULL,
    Name            CHAR(30) NOT NULL,
    Description     CHAR(60) NOT NULL,
    Etcetera        CHAR(30) NOT NULL,
    CreatedDateTime DATETIME NOT NULL,      
    CONSTRAINT Recipe_PK  PRIMARY KEY (RecipeNumber),
    CONSTRAINT Recipe_AK1 UNIQUE      (Name),       -- ALTERNATE KEY.
    CONSTRAINT Recipe_AK2 UNIQUE      (Description) -- ALTERNATE KEY.      
);

CREATE TABLE RecipeCategorization ( -- Represents an associative entity type.
    RecipeNumber       INT      NOT NULL,
    RecipeCategoryCode CHAR(2)  NOT NULL, -- Contains meaningful and readable values.
    Etcetera           CHAR(30) NOT NULL,
    ClassifiedDateTime DATETIME NOT NULL,      
    CONSTRAINT RecipeCategorization_PK                   PRIMARY KEY (RecipeNumber, RecipeCategoryCode), -- Composite PK.
    CONSTRAINT RecipeCategorization_to_Recipe_FK         FOREIGN KEY (RecipeNumber)
        REFERENCES Recipe (RecipeNumber),
    CONSTRAINT RecipeCategorization_to_RecipeCategory_FK FOREIGN KEY (RecipeCategoryCode)
        REFERENCES RecipeCategory (RecipeCategoryCode)       
);    

CREATE TABLE IngredientCategory ( -- Plays a ‘look-up’ role.
    IngredientCategoryNumber INT      NOT NULL,
    Name                     CHAR(30) NOT NULL,
    Description              CHAR(60) NOT NULL,
    Etcetera                 CHAR(30) NOT NULL,
    CreatedDateTime          DATETIME NOT NULL,
    CONSTRAINT IngredientCategory_PK  PRIMARY KEY (IngredientCategoryNumber),
    CONSTRAINT IngredientCategory_AK1 UNIQUE      (Name),       -- ALTERNATE KEY.
    CONSTRAINT IngredientCategory_AK2 UNIQUE      (Description) -- ALTERNATE KEY.      
);

CREATE TABLE Ingredient (
    IngredientNumber INT      NOT NULL,
    Name             CHAR(30) NOT NULL,
    Description      CHAR(60) NOT NULL,
    Etcetera         CHAR(30) NOT NULL,
    CreatedDateTime  DATETIME NOT NULL,
    CONSTRAINT Ingredient_PK  PRIMARY KEY (IngredientNumber),
    CONSTRAINT Ingredient_AK1 UNIQUE      (Name),       -- ALTERNATE KEY.
    CONSTRAINT Ingredient_AK2 UNIQUE      (Description) -- ALTERNATE KEY.      
);

CREATE TABLE IngredientCategorization ( -- Stands for an ssociative entity type.
    IngredientNumber         INT      NOT NULL,
    IngredientCategoryNumber INT      NOT NULL,
    Etcetera                 CHAR(30) NOT NULL,
    GroupedDateTime          DATETIME NOT NULL,      
    CONSTRAINT IngredientCategorization_PK                       PRIMARY KEY (IngredientNumber, IngredientCategoryNumber), -- Composite PK.
    CONSTRAINT IngredientCategorization_to_Ingredient_FK         FOREIGN KEY (IngredientNumber)
        REFERENCES Ingredient (IngredientNumber),
    CONSTRAINT IngredientCategorization_to_IngredientCategory_FK FOREIGN KEY (IngredientCategoryNumber)
        REFERENCES IngredientCategory (IngredientCategoryNumber)        
);

CREATE TABLE IngredientListing ( -- Denotes an associative entity type
    RecipeNumber     INT      NOT NULL,
    IngredientNumber INT      NOT NULL,
    Etcetera         CHAR(30) NOT NULL,
    IncludedDateTime DATETIME NOT NULL,      
    CONSTRAINT IngredientListing_PK               PRIMARY KEY (RecipeNumber, IngredientNumber), -- Composite PK.
    CONSTRAINT IngredientListing_to_Recipe_FK     FOREIGN KEY (RecipeNumber)
        REFERENCES Recipe (RecipeNumber),
    CONSTRAINT IngredientListing_to_Ingredient_FK FOREIGN KEY (IngredientNumber)
        REFERENCES Ingredient (IngredientNumber)     
);

--
--

With such structure, you prevent ambiguities and all their logical and pragmatic repercussions. You avoid mixing up

the representation of multiple entity types in a single “shared” table, and
the meaning and intention of each of their attributes (in “shared” columns),

which permits restricting much more easily their corresponding domains of values and the subsequent references through FOREIGN KEY (FK) constraints. The queries and result sets become much more clear because each aspect is approached separately.

This structure, hence, aids by itself in reflecting the business context under consideration with precision, remaining consistent with the characteristics delimited in the conceptual analysis, and guaranteeing that the data (every assertion in the form of a row) complies with the business rules.

Practical considerations regarding the RecipeCategory table

Seing that the RecipeCategory table is supposed to (1) fulfill a look-up role and (2) would hold only a few rows, I consider that it would be very advantageous to declare it with a PRIMARY KEY (PK) constraint on a column that keeps values that are meaningful and, at the same time, physically light and narrow, i.e., of type CHAR(2) or maybe CHAR(3). So it might comprise, e.g., the rows that follow:

+-——————————————————-+-——————————-+-—————————————————————————-+-————-+-————-+
| RecipeCategoryCode | Name       | Description               | Etc… | Cre… |
+-——————————————————-+-——————————-+-—————————————————————————-+-————-+-————-+
| O                  | Omnivorous | Category of recipes that… | …    | …    |
+--------------------+------------+---------------------------+------+------+
| VT                 | Vegetarian | Category of recipes that… | …    | …    |
+--------------------+------------+---------------------------+------+------+
| VG                 | Vegan      | Category of recipes that… | …    | …    |
+--------------------+------------+---------------------------+------+------+
| F                  | Foo        | Category of recipes that… | …    | …    |
+--------------------+------------+---------------------------+------+------+
| B                  | Bar        | Category of recipes that… | …    | …    |
+--------------------+------------+---------------------------+------+------+

In this manner, when such PK column “migrates” to the RecipeCategorization table as a column with a FK constraint, it will have stable values that maintain their meaning and intention, making the result sets much more readable than, say, an INT value, which definitely can assist in the interpretation of the information obtained through, e.g., derived tables (those fetched back by virtue of a SELECT statement or a VIEW definition).

All this remains in agreement with the spirit of the relational model^†† published in 1970 by Dr. Codd, where he included the following relevant note:

Naturally, as with any data put into and retrieved from a computer system, the user will normally make far more effective use of the data if he is aware of its meaning.

Relational databases and application programs

The extract of your question cited below has to do, among other things, with the “link” between (a) the database under discussion and (b) the application programs (apps) that will work along with it:

I tested the ideas in the related Q & A Relational tables with same content, custom solution. It's easier to handle data on the frontend side and housekeeping, but data integrity could be an issue if the database grows.

It is opportune to point out that the data, in its essence and of its nature, is a highly valuable organizational asset; as a result, it must be administered as such. This fundamental resource tends to outlive apps, app development platforms and programming paradigms.

With that, a RDB ought to be an independent (self-protective, self-describing, etc.) software component that is capable of being shared by multiple apps, and it must not be “coupled” —using object oriented programming parlance— with the code of any of these apps.

Consequently, you should (a) manage the data by dint of the instruments provided by the relational theory, the modeling platforms and the SQL sytem of choice, and (b) shape and implement the relevant processes and the graphical user interface with the pertinent app development tools. In this way, all the software components will work harmonically; they will be independent but, at the same time, well interconnected.

Endnote and reference

_{^† Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in december 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) some of the early relational model works authored by Dr. E. F. Codd; on (b) the Entity-Relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.}

_{^†† Codd, E. F. (June 1970). A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, Volume 13 Issue 6 (pp. 377-387). New York, NY, USA.}