Example
- Table Recipe uses a lookup table for Recipe Category;
- Table Ingredient uses a lookup table for Ingredient Category;
- They are in a Many-to-Many relationship;
- A Recipe Category will not be used as a Ingredient Category or vice-versa;
Question
Should it be separate lookup tables, or can I use one table with an extra column to identify if it's an Ingredient or Recipe Category? Does this affect optimization or query speed?
Roughly 100 Records if a single table; separated tables it's 5 and 95. The table will appear more than two times in the same query.
I tested the ideas in the related Q & A Relational tables with same content, custom solution. It's easier to handle data on the frontend side and housekeeping, but data integrity could be an issue if the database grows.
Clarifications
- A recipe can be classified by more than one recipe category
- An ingredient can be classified by more than one ingredient category
- There are 5 recipe categories in the business domain
- There are 95 ingredient categories in the business domain
- Three examples of concrete recipe categories are Omivorous, Vegetarian, Vegan.
- Three examples of concrete ingredient categories are Cereal, Meat, Fish.
I've been testing around with both solutions but with so few records I don't see any improvement or loss of performance. Also when building the front end, it's easier to work with one model and a column than 2 models but it's a minor issue.
Related Q & A on Stack Overflow: One lookup table or many lookup tables?
Best Answer
Although
RecipeCategory
andIngredientCategory
have very similar names and attributes, they are in fact two different entity types, because each of them (a) carries a specific business domain meaning, (b) has distinct kinds of relationships and (c) entails a particular set of rules.In this regard, if the intention is to implement a relational database (RDB), it is quite helpful to perform an analysis of the business domain of interest (in order to construct a conceptual model) in terms of entity types (i.e., types or prototypes of entity occurrences), their attributes and interrelationships before thinking in terms of tables, columns and constraints (points that correspond to the logical model). Proceeding in this fashion, it is much easier to capture the meaning of the business domain with accuracy and then reflect it in an actual RDB structure.
Business domain rules
Recipe and RecipeCategory
Let us start working with two entity types:
Recipe
andRecipeCategory
. In the scenario in question, it appears reasonable to affirm that:Such situation indicates, yes, that
Recipe
andRecipeCategory
are involved in a many-to-many (M:N) relationship, which implies the existence of an associative entity type, that I am going to callRecipeCategorization
.Ingredient and IngredientCategory
Then, let us deal with
Ingredient
andIngredientCategory
. In this case, we can affirm that:This means that
Recipe
andRecipeCategory
are connected in another M:N relationship, which entails the existence of another associative entity type, that I denominatedIngredientCategorization
.RecipeCategory and IngredientCategory
As discussed above, one can observe that the concrete occurrences of
RecipeCategory
are meant to be (directly) associated with the specific instances ofRecipe
, and not with the occurrences ofIngredient
. In the same manner, the concrete instances ofIngredientCategory
are meant to be (directly) connected with the specific occurrences ofIngredient
, and not with the instances ofRecipe
. Therefore,RecipeCategory
andIngredientCategory
are distinct entity types, and demand their own respective individual considerations.Recipe and Ingredient
Finally, we can assume that:
Thus, there is another M:N relationship, this time between
Recipe
andIngredient
, which reveals the existence of other associative entity type, that I am going to entitleRecipeListing
.Illustrative IDEF1X model
Then, from the aforementioned analysis and consequent formulations, I created the IDEF1X† model shown in Figure 1:
As demonstrated, each entity type is depicted in its corresponding individual box, and is displayed directly related to (i) its own attributes, contained in the respective box, and to (ii) the entity types that apply, by way of the relationship lines.
Of course, there are other indirect relationships that should be derived via the direct connections exposed here.
Logical and physical elements
Once we have analyzed and defined the pertinent types of the things of significance, it is time to determine how to manage them by means of mathematical relations (declared and visualized as tables, if created on a certain SQL database management system), which are composed of domains (portrayed as columns) and tuples (pictured as rows).
As relations are abstract resources, Dr. E. F. Codd —the originator of the relational paradigm— envisioned the utility of representing them in tabular form, so that, e.g., the users and implementers of a RDB can approach them in a more familiar way. In this respect, even though a relational table has a concrete shape, it is still a logical element of a given database, and its components, e.g., columns, rows and constraints are logical as well.
In this regard, it is very important and of vast pragmatical value to distinguish logical from physical elements. For instance, in file systems, a physical record can be made up of zero, one or more fields. In the case of a RDB, the logical elements can be served by one or more physical units (at a lower level of abstraction, then), e.g., indexes, records, pages, extents, etc.
Thus, in accordance with the points detailed above, a table —being a logical level component— does not have fields (which may well be part of the underlying concrete scaffoldings supporting a table declaration, but work at the physical level).
Expository logical SQL-DDL structure
That being said, and based on the IDEF1X model previously presented, both
RecipeCategory
andIngredientCategory
(and the rest of the identified entity types too) require an individual base table that stands for each of them, as exemplified in the following DDL structure:With such structure, you prevent ambiguities and all their logical and pragmatic repercussions. You avoid mixing up
which permits restricting much more easily their corresponding domains of values and the subsequent references through FOREIGN KEY (FK) constraints. The queries and result sets become much more clear because each aspect is approached separately.
This structure, hence, aids by itself in reflecting the business context under consideration with precision, remaining consistent with the characteristics delimited in the conceptual analysis, and guaranteeing that the data (every assertion in the form of a row) complies with the business rules.
Practical considerations regarding the RecipeCategory table
Seing that the
RecipeCategory
table is supposed to (1) fulfill a look-up role and (2) would hold only a few rows, I consider that it would be very advantageous to declare it with aPRIMARY KEY
(PK) constraint on a column that keeps values that are meaningful and, at the same time, physically light and narrow, i.e., of typeCHAR(2)
or maybeCHAR(3)
. So it might comprise, e.g., the rows that follow:In this manner, when such PK column “migrates” to the
RecipeCategorization
table as a column with a FK constraint, it will have stable values that maintain their meaning and intention, making the result sets much more readable than, say, anINT
value, which definitely can assist in the interpretation of the information obtained through, e.g., derived tables (those fetched back by virtue of a SELECT statement or a VIEW definition).All this remains in agreement with the spirit of the relational model†† published in 1970 by Dr. Codd, where he included the following relevant note:
Relational databases and application programs
The extract of your question cited below has to do, among other things, with the “link” between (a) the database under discussion and (b) the application programs (apps) that will work along with it:
It is opportune to point out that the data, in its essence and of its nature, is a highly valuable organizational asset; as a result, it must be administered as such. This fundamental resource tends to outlive apps, app development platforms and programming paradigms.
With that, a RDB ought to be an independent (self-protective, self-describing, etc.) software component that is capable of being shared by multiple apps, and it must not be “coupled” —using object oriented programming parlance— with the code of any of these apps.
Consequently, you should (a) manage the data by dint of the instruments provided by the relational theory, the modeling platforms and the SQL sytem of choice, and (b) shape and implement the relevant processes and the graphical user interface with the pertinent app development tools. In this way, all the software components will work harmonically; they will be independent but, at the same time, well interconnected.
Endnote and reference
† Integration Definition for Information Modeling (IDEF1X) is a highly recommendable data modeling technique that was established as a standard in december 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on (a) some of the early relational model works authored by Dr. E. F. Codd; on (b) the Entity-Relationship view, developed by Dr. P. P. Chen; and also on (c) the Logical Database Design Technique, created by Robert G. Brown.
†† Codd, E. F. (June 1970). A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, Volume 13 Issue 6 (pp. 377-387). New York, NY, USA.