Stroring a type of an entry

database-designnormalization

There is a table estate with numeric fields districtru and districthe (for a name of a district in Russian and Hebrew), which refer to primary keys of the table districts. The districts table has the following fields:

id INT – primary ID
str VARCHAR – the name of the district in either Russian or Hebrew.

The question: Is it worth to add ENUM('ru','he') ("Russian" or "Hebrew") field to the districts table. It seems not to be strictly necessary as it is possible to conclude if a string in this table is a Russian or Hebrew name (or maybe both) by consulting the table estate (which of its keys districtru or districthe refer to this string in districts table).

Best Answer

If this is a language translation issue and the districts are the same whether they are listed in Hebrew or Russian, you might actually have more success with a slightly different table structure.

Right now, based on your description, you have:

ESTATE
+---------+------------+------------+
| Columns | DistrictRU | DistrictHE |
+---------+------------+------------+
|       1 | 1          | NULL       |
|       2 | NULL       | 2          |
|       3 | 3          | 4          |
+---------+------------+------------+


DISTRICTS
+----+-----+
| ID | STR |
+----+-----+
|  1 | R1  |
|  2 | H2  |
|  3 | R3  |
|  4 | H3  |
+----+-----+

Instead of doing this, I'd propose something slightly different:

LANGUAGE
+----+---------+
| ID |   STR   |
+----+---------+
|  1 | Russian |
|  2 | Hebrew  |
+----+---------+

DISTRICTS
+----+------------+-----+
| ID | LanguageID | STR |
+----+------------+-----+
|  1 |          1 | R1  |
|  2 |          2 | H2  |
|  3 |          1 | R3  |
|  4 |          2 | H3  |
+----+------------+-----+

ESTATE
+---------+------------+
| Columns | DistrictID |
+---------+------------+
|       1 |          1 |
|       2 |          2 |
|       3 |          3 |
+---------+------------+

There are a few advantages here. First of all, if you need more languages than Russian and Hebrew (say, English or possibly Amharic), you don't need to modify the table structure. Secondly, you now have three normalized tables rather than one normalized and one with repeating structures. And thirdly, you don't have to worry about an enumeration field on your DISTRICTS table that might be out of sync with a value in ESTATE.

Tying these together with a query assumes that you know the language you want to display results in (presuming that you don't need to display both names at once). Displaying all of the names at once becomes slightly more difficult, but if you're pulling this data through a business layer and displaying to end users through an application, the easy version is to use two queries: one to get the estate(s) and one to get the full listing of districts by estate. Then, the web server could piece it together.

Related Solutions

SQL Server – Best Practices for Handling Language

If I understand your requirements correctly....

I would just use the natural key, LanguageCode-CultureCode ("en-US," for example). It's small enough. (I'm using the entire "en-US" as the primary key to differentiate it from "en-GB," for example.)

CREATE TABLE [dbo].[Language](
    [Language] [char](2) NOT NULL,
    [Culture] [char](2) NOT NULL,
    [LanguageCode]  AS (([Language]+'-')+[Culture]) PERSISTED NOT NULL,
 CONSTRAINT [PK_Language] PRIMARY KEY CLUSTERED 
(
    [Language] ASC,
    [Culture] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
 CONSTRAINT [LanguageCode] UNIQUE NONCLUSTERED 
(
    [LanguageCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[Language_Text](
    [LanguageID] [varchar](5) NOT NULL,
    [LanguageCode] [varchar](5) NOT NULL,
    [LanguageName] [nvarchar](20) NULL,
 CONSTRAINT [PK_Language_Text] PRIMARY KEY CLUSTERED 
(
    [LanguageID] ASC,
    [LanguageCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[Language_Text]  WITH CHECK ADD  CONSTRAINT [FK_Language_Text_Language] FOREIGN KEY([LanguageCode])
REFERENCES [dbo].[Language] ([LanguageCode])
GO

ALTER TABLE [dbo].[Language_Text] CHECK CONSTRAINT [FK_Language_Text_Language]
GO

This should allow you to get all articles in en or sv, and you can also query for en-US or sv-SE. Presumably, although it wasn't in your example, you could also query for en-CA, fr-CA, en, fr, or CA.

Edit--I'm sorry, you're right, no search by Culture in my old code. Revamped above, sorry. Here's a sample of content:

Language:

Language-Culture-LanguageCode
en  US  en-US
sv  SE  sv-SE

Language_Text:

LanguageID-LanguageCode-LanguageName
en-US   en-US   English
en-US   sv-SE   Engelska
sv-SE   en-US   Swedish
sv-SE   sv-SE   Svenska

Searching by Culture (Canada):

SELECT        test.dbo.Language_Text.LanguageID, test.dbo.Language_Text.LanguageCode, test.dbo.Language_Text.LanguageName
FROM            test.dbo.Language_Text INNER JOIN
                         test.dbo.Language ON test.dbo.Language_Text.LanguageID = test.dbo.Language.LanguageCode
WHERE        (test.dbo.Language.Culture = 'CA')

Searching by Language (French):

SELECT        test.dbo.Language_Text.LanguageID, test.dbo.Language_Text.LanguageCode, test.dbo.Language_Text.LanguageName
FROM            test.dbo.Language_Text INNER JOIN
                         test.dbo.Language ON test.dbo.Language_Text.LanguageID = test.dbo.Language.LanguageCode
WHERE        (test.dbo.Language.Language = 'fr')

Searching by LanguageCode (Swedish):

SELECT LanguageName
  FROM [test].[dbo].[Language_Text]
  where (LanguageID = 'sv-SE')

Database Design – Best Logical Structure for Different Entity Categories

I recommend you to read about the dangers of the Entity-Attribute-Value pattern on this presentation by @Bill Karwin.

One of the solutions is storing NoSQL-like data in a serialized BLOB, if you do not need to read and write individual properties. That is, storing a key (product) as the primary key, and the variable property-value pairs all together in a single blob. This is a non-normalized structure, leaving the responsibility of the constraints to the application layer, but it works great for certain patterns (the same that usually work well with NoSQL solutions). Even if you need indexing, you can create inverted indexes on a separate table as needed.

This and other solutions can be seen on this presentation in the latest MySQL Conference, including when you should search for alternative database engines.

Best Answer

Related Solutions

SQL Server – Best Practices for Handling Language

Database Design – Best Logical Structure for Different Entity Categories

Related Question