Database Design – Store Multiple Related Items in One Field or Separately?

database-design

I'm developing a web app, something like an online dictionary.

One expression might have several different meanings, for example, the word "flight" has two different meanings: a plane trip, and the act of running away. I could use 2 entries (records) to represent them respectively.

One expression might also have several similar definitions(explanations), for example, the expression "be all about" is defined as "be focused on or interested in (a particular thing)" on lexico and is explained as "used to say what the most important aspect of something is
" on oxfordlearners

How do I represent this expression?

I can imagine two approaches, one is put those similar definitions in one entry (record), one field, something like {(definition_A, source), (definition_B, source), …}, which seems ugly. possibly inefficient when indexing. The advantage is that multiple definitions are linked to one entry.

Another way is to use 2 entries, the question is, how do I link those two entries to one meaning.

Could someone give a hint?


The difference between my question and One field with multiple values? is every driver has a license could be used to identify them.

In the example of "square", I don't have an existing attribute to identify one meaning in "expression table" if I put its definition in another table.

Inside database, it can be done, in reality, it sometimes might not be plausible.

Even if I implement that way, bringing bad user experience by force them fill in a boring long form, I cannot import existing data, e.g. the two items above.

In other words, this method seems to need plenty of human laboring.

Best Answer

So the question LowlyDBA is likely to provide you some insight on how to implement a many-to-many relationship. But I think there's a little more to your specific situation that should be expanded upon.

First how to model this. Whenever you see " has zero/one or more ", this indicates a one-to-zero-to-one relationship. The doesn't necessarily need to be another entity as in the other question.

So you would have one table with your terms:

CREATE TABLE Term
(
  Term  VARCHAR(50)  NOT NULL  /* 50 is probably reasonable, but whatever works best for you */
 ,CONSTRAINT PK_Term PRIMARY KEY (Term)
 ,CONSTRAINT CK_Term_Length_IsValid CHECK (LENGTH(Term) >= 1) /* Or some other reasonable length */
 /* You can/should define additional constraints to keep invalid characters out of this column */
)
;

Next we need a table to store the definitions for each Term:

CREATE TABLE TermDefinition
(
  Term        VARCHAR(50)   NOT NULL
 ,Definition  VARCHAR(250)  NOT NULL /* Or some other reasonable maximum length */
 ,CONSTRAINT FK_TermDefinition_Definition_Of_Term FOREIGN KEY (Term) REFERENCES Term (Term)
 ,CONSTRAINT PK_TermDefinition PRIMARY KEY (Term, Definition)
 ,CONSTRAINT CK_TermDefinition_Definition_Length_IsValid CHECK (LENGTH(Definition) >= 1)
)
;

So you might ask why not replace the primary key with an integer row pointer like TermId and save some space?

  1. In order for TermId to be a valid key, it has to replace an existing key which becomes an alternate key UNIQUE (Term). This creates another copy of your data via an index. Depending on the average number of definitions per term you might not really save a significant amount of space to begin with.
  2. Any search for a definition now has to hit the index for Term (again, a copy of Term) to select the TermId before querying TermDefinition, versus just querying TermDefinition directly. So extra code, more index tuning/maintenance.
  3. We should avoid creating meaningless objects when possible.

You might also ask why you would need a Term table in the first place, if all lookups would be performed against TermDefinition?

  1. We may need a unqiue/distinct list of Terms in the future for other reasons, such as relating Terms to each other.
  2. You may want to make it hard for people to add new Terms but not new TermDefinitions.

*Some may take umbrage with a VARCHAR primary key column, but the data has to stored somehow, the data has to be read somehow, and if that's a list of variable length characters, it is what it is. We either create a primary key with varying characters forming the B-Tree, or an alternate key with varying characters forming another B-Tree.