Good idea to include summary of frequently accessed data from another foreign-keyed table

foreign keynormalizationrelational-theory

Sorry for poor title, I'll try to explain it in more detail here.

I have an application that often gets a row from table A, then looks up a column from that in a foreign key mapping table (table B), and finally looks up several rows with the corresponding foreign key in table C.

My question is: would it be better for table A to have a column containing data in the particular column we often want from table C? This would be anti-normalization, but would save on a lot of queries.

As an example, let's say we have a table "french words" (table A), with columns like:

id, word, pronunciation, frequency

We also have a table (table B) that maps the "french words" id to any number of "german words" id (table C)

And then in "german words" (table C) we also have id, word, pronunciation, frequency.

When we look up a French word, we often want to get the highest frequency German word that it's associated with via table B. Would it be better to just include a highest_frequency_german_word column in the French words table?

Best Answer

Is there some particular reason you're worried about how much work your computer has to do to join three tables?

The purpose of normalizing your schema is to make it much harder to corrupt your data when you insert, update and delete it. Normalizing your schema means that reading your data will be more work (generally), but your code for maintaining your data will be much simpler and therefore much less prone to data corruption errors, as well as being cheaper to build and maintain, of course.

If your data is pretty static, i.e. you write it once and then read it a lot, then the benefits of normalization are less important.

If your performance or resource utilization is very critical, and you can't get the performance you need from normalized data, then you can consider denormalizing your schema. However, whenever you use denormalization you should do it with full knowledge of all of the potential traps you've set for yourself with respect to data corruption.

Normalize by default and denormalize as necessary, and with caution.