Sql-server – rule of thumb for including the value of a foreign key in a table to avoid large joins

performancesql server

Example:

We have a table with 500 rows in table "Cars". There is also a table "Colors" which has roughly 20 rows. Table "Cars" has a foreign key reference to table "Colors". To get the color of the car we would need to inner join on table "Colors":

select car.Name,
color.Name as Color
from dbo.Cars as car
inner join dbo.Colors as color on color.ID = car.ColorID

The result would look something like this:

+-------------+--------+
|    Name     | Color  |
+-------------+--------+
| Rolls Royce | Red    |
+-------------+--------+

But lets say you have 10,000,000 rows in Cars and 2,000,000 rows in Colors; the inner join would cause quite a performance hit. It would probably be better to include the column "Color" and "ColorID" in table "Cars" to avoid this inner join. So you query would look like this:

select Name,
Color
from dbo.Cars

Is there a rule of thumb to rather include the foreign key value inside your table once your table reaches an x-amount of rows? Or would it be better to remove the foreign key completely and just have a column called "Color" in table "Cars" thus causing table "Colors" to be redundant?

Best Answer

Don't denormilize, keep 2 tables (cars and colors) whatever tables size. I think that you don't need to select 10 millions with one query but if you realy need it, you can do it by batch. I don't know your problematic but by default, never denormilize.

If you have a problem with big tables you can expose it but i am pretty sure that denormalization will be not the solutiuon.