Stroring a type of an entry

database-designnormalization

There is a table estate with numeric fields districtru and districthe (for a name of a district in Russian and Hebrew), which refer to primary keys of the table districts. The districts table has the following fields:

  • id INT – primary ID
  • str VARCHAR – the name of the district in either Russian or Hebrew.

The question: Is it worth to add ENUM('ru','he') ("Russian" or "Hebrew") field to the districts table. It seems not to be strictly necessary as it is possible to conclude if a string in this table is a Russian or Hebrew name (or maybe both) by consulting the table estate (which of its keys districtru or districthe refer to this string in districts table).

Best Answer

If this is a language translation issue and the districts are the same whether they are listed in Hebrew or Russian, you might actually have more success with a slightly different table structure.

Right now, based on your description, you have:

ESTATE
+---------+------------+------------+
| Columns | DistrictRU | DistrictHE |
+---------+------------+------------+
|       1 | 1          | NULL       |
|       2 | NULL       | 2          |
|       3 | 3          | 4          |
+---------+------------+------------+


DISTRICTS
+----+-----+
| ID | STR |
+----+-----+
|  1 | R1  |
|  2 | H2  |
|  3 | R3  |
|  4 | H3  |
+----+-----+

Instead of doing this, I'd propose something slightly different:

LANGUAGE
+----+---------+
| ID |   STR   |
+----+---------+
|  1 | Russian |
|  2 | Hebrew  |
+----+---------+

DISTRICTS
+----+------------+-----+
| ID | LanguageID | STR |
+----+------------+-----+
|  1 |          1 | R1  |
|  2 |          2 | H2  |
|  3 |          1 | R3  |
|  4 |          2 | H3  |
+----+------------+-----+

ESTATE
+---------+------------+
| Columns | DistrictID |
+---------+------------+
|       1 |          1 |
|       2 |          2 |
|       3 |          3 |
+---------+------------+

There are a few advantages here. First of all, if you need more languages than Russian and Hebrew (say, English or possibly Amharic), you don't need to modify the table structure. Secondly, you now have three normalized tables rather than one normalized and one with repeating structures. And thirdly, you don't have to worry about an enumeration field on your DISTRICTS table that might be out of sync with a value in ESTATE.

Tying these together with a query assumes that you know the language you want to display results in (presuming that you don't need to display both names at once). Displaying all of the names at once becomes slightly more difficult, but if you're pulling this data through a business layer and displaying to end users through an application, the easy version is to use two queries: one to get the estate(s) and one to get the full listing of districts by estate. Then, the web server could piece it together.