SQL Server – Best Practices for Handling Language

database-designsql server

I am a coder, not a database admin, so my question may be a little obvious to some. I just cannot find the correct search term to find the answer.

I'm trying to create a database structure for both currency and language management.
I'm thinking of following structure:

Language

LanguageId int not null primary key
iso2 char(2) not null

For example:

1, EN

2, SV

Then I need to store the name of the language as well, which should be translatable into other languages. For this, I have a table:

Language_Text

LanguageId int not null
LanguageCulture char(5) not null
LanguageName varchar(20) not null
Primary Key(LanguageId, LanguageCulture)

For example:

1, en-US, English

1, sv-SE, Engelska

2, en-US, Swedish

2, sv-SE, Svenska

The issue comes when setting up keys and relationships. In Language there is only 1 key, while in Language_Texts there is an aggregate key. I would like a relation between the two tables, but having problems with this due to difference in keys.

I want to be able to get all various texts for a specific language, as well as query to get a specific culture if need be.

What is the best practice to use in this case?

Thanks in advance for any help in this matter!

Best regards

Peter Kindberg

Best Answer

If I understand your requirements correctly....

I would just use the natural key, LanguageCode-CultureCode ("en-US," for example). It's small enough. (I'm using the entire "en-US" as the primary key to differentiate it from "en-GB," for example.)

CREATE TABLE [dbo].[Language](
    [Language] [char](2) NOT NULL,
    [Culture] [char](2) NOT NULL,
    [LanguageCode]  AS (([Language]+'-')+[Culture]) PERSISTED NOT NULL,
 CONSTRAINT [PK_Language] PRIMARY KEY CLUSTERED 
(
    [Language] ASC,
    [Culture] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
 CONSTRAINT [LanguageCode] UNIQUE NONCLUSTERED 
(
    [LanguageCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[Language_Text](
    [LanguageID] [varchar](5) NOT NULL,
    [LanguageCode] [varchar](5) NOT NULL,
    [LanguageName] [nvarchar](20) NULL,
 CONSTRAINT [PK_Language_Text] PRIMARY KEY CLUSTERED 
(
    [LanguageID] ASC,
    [LanguageCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[Language_Text]  WITH CHECK ADD  CONSTRAINT [FK_Language_Text_Language] FOREIGN KEY([LanguageCode])
REFERENCES [dbo].[Language] ([LanguageCode])
GO

ALTER TABLE [dbo].[Language_Text] CHECK CONSTRAINT [FK_Language_Text_Language]
GO

This should allow you to get all articles in en or sv, and you can also query for en-US or sv-SE. Presumably, although it wasn't in your example, you could also query for en-CA, fr-CA, en, fr, or CA.

Edit--I'm sorry, you're right, no search by Culture in my old code. Revamped above, sorry. Here's a sample of content:

Language:

Language-Culture-LanguageCode
en  US  en-US
sv  SE  sv-SE

Language_Text:

LanguageID-LanguageCode-LanguageName
en-US   en-US   English
en-US   sv-SE   Engelska
sv-SE   en-US   Swedish
sv-SE   sv-SE   Svenska

Searching by Culture (Canada):

SELECT        test.dbo.Language_Text.LanguageID, test.dbo.Language_Text.LanguageCode, test.dbo.Language_Text.LanguageName
FROM            test.dbo.Language_Text INNER JOIN
                         test.dbo.Language ON test.dbo.Language_Text.LanguageID = test.dbo.Language.LanguageCode
WHERE        (test.dbo.Language.Culture = 'CA')

Searching by Language (French):

SELECT        test.dbo.Language_Text.LanguageID, test.dbo.Language_Text.LanguageCode, test.dbo.Language_Text.LanguageName
FROM            test.dbo.Language_Text INNER JOIN
                         test.dbo.Language ON test.dbo.Language_Text.LanguageID = test.dbo.Language.LanguageCode
WHERE        (test.dbo.Language.Language = 'fr')

Searching by LanguageCode (Swedish):

SELECT LanguageName
  FROM [test].[dbo].[Language_Text]
  where (LanguageID = 'sv-SE')

Related Solutions

Database Design – Implementing a Multilingual User Interface

It is possible to store strings with different collations in the same column using SQL_VARIANT:

CREATE TABLE dbo.Localized
(
    text_id     INTEGER NOT NULL,
    lang_id     INTEGER NOT NULL,
    text_body   SQL_VARIANT NOT NULL,

    CONSTRAINT [PK dbo.Localized text_id, lang_id]
        PRIMARY KEY CLUSTERED (text_id, lang_id),
)
GO
INSERT dbo.Localized
    (text_id, lang_id, text_body)
VALUES
    (1001, 2057, N'Database problems' COLLATE Latin1_General_CI_AS);
GO
INSERT dbo.Localized
    (text_id, lang_id, text_body)
VALUES
    (1001, 1025, N'قاعدة بيانات المشاكل' COLLATE Arabic_CI_AS)

This design has several drawbacks (including being limited to 8000 bytes), not least in the search area: SQL_VARIANT cannot be full-text indexed, and some string comparison features (e.g. LIKE) cannot be used directly either. On the other hand, it is possible to create a regular index on SQL_VARIANT and perform the more basic comparisons (e.g. <, =, >) in a collation-aware fashion:

CREATE UNIQUE INDEX uq1 ON dbo.Localized (text_body)
GO
-- One row
SELECT
    l.*
FROM dbo.Localized AS l 
WHERE
    l.text_body = CONVERT(SQL_VARIANT, N'Database problems' COLLATE Latin1_General_CI_AS)

-- No rows (and no collation error!)
SELECT
    l.*
FROM dbo.Localized AS l
WHERE
    l.text_body = CONVERT(SQL_VARIANT, N'Database problems' COLLATE Arabic_CI_AS)

-- One row, index seek, manual version of "LIKE 'D%'"
SELECT
    l.*
FROM dbo.Localized AS l 
WHERE
    l.text_body >= CONVERT(SQL_VARIANT, N'D' COLLATE Latin1_General_CI_AS)
    AND l.text_body < CONVERT(SQL_VARIANT, N'E' COLLATE Latin1_General_CI_AS)

We can also write the usual sort of procedures:

CREATE PROCEDURE dbo.GetLocalizedString
    @text_id    INTEGER,
    @lang_id    INTEGER,
    @text_body  SQL_VARIANT OUTPUT
AS
BEGIN
    SELECT
        @text_body = l.text_body
    FROM dbo.Localized AS l
    WHERE
        l.text_id = @text_id
        AND l.lang_id = @lang_id
END
GO
DECLARE @text SQL_VARIANT

EXECUTE dbo.GetLocalizedString
    @text_id = 1001,
    @lang_id = 1025,
    @text_body = @text OUTPUT

SELECT @text

Of course, full-text indexing is also problematic in the "single table for all translations" design, since full-text indexing (all but) requires a language id setting per column. The multiple table design described by Joop Eggen could be full-text indexed (though it would naturally require one index per table).

The other main option is to have one column per locale in the base table:

CREATE TABLE dbo.Example
(
    text_id     INTEGER NOT NULL,
    text_2057   NVARCHAR(MAX) COLLATE Latin1_General_CI_AS NULL,
    text_1025   NVARCHAR(MAX) COLLATE Arabic_CI_AS NULL,

    CONSTRAINT [PK dbo.Example text_id]
        PRIMARY KEY CLUSTERED (text_id)
)

This arrangement does have a certain simplicity to it, and works well with full-text indexing, though it does require a new column to be added with each new language, and many developers find this sort of structure inelegant and unsatisfactory to work with.

Each of the alternatives has advantages and disadvantages, and will require indirection at some level or another, so it might depend on where the developers concerned feel happiest locating that indirection. I imagine most people will prefer the multiple-table design for most purposes.

Mysql – Appropriate multi-language schema, or overkill

We've done a lot of this, and (administrative) users were allowed to fix the translations live. (You still might want a caching layer, but I'm totally down with driving this with a real database and not resource files - it gives you a ton of power to query and find things which need to be translated, etc). I think your schema is probably fine, so I'll just pass on some stuff we learned in the hope that it's useful.

One thing you have left out is phrases with insertion points. In the example below, the order is reversed and the language is still English, but this could very easily be two different languages - pretend this is just two languages who normally put things in a different order.

Hello, <username> you have <x> points!

You've got <x> points to spend, <username>!

In our pre-.NET, we had a routine which did the insertion so the phrases would look like this:

Hello, {0:username} you have {1:points} points!

You've got {1:points} points to spend, {0:username}!

This would obviously simply be used in your code as <%= String.Format(phrase, username, points); %> or similar

Which helped the translator a bit. But .NET String.FOrmat doesn't support commenting within the format string, unfortunately.

As you say, you would not want to handle that in your php with locale awareness or meta phrases.

So what we had was a master phrase table:

phraseid, english, supplemental info

and a localized table:

phraseid, localeid, translation

You've also assumed with INNER JOINS that the localized versions exist - we tended to leave them out until they were translated, so that query of yours would end up returning nothing at first (not even the default)

If a translation didn't exist, ours defaulted to English, then fellback to code-provided (in case the database didn't have the ID, and it was also clear from the code what phrase identifier "TXT_LNG_WRNNG_INV_LOW" was actually trying to get) - so the equivalent of this query is what we used:

SELECT COALESCE(localized.translation, phrase.english, @fallback)
FROM DUAL
LEFT JOIN phrase
    ON phrase.phraseid = @phraseid
LEFT JOIN localized
    ON localized.phraseid = phrase.phraseid
    AND localized.localeid = @localeid

Obviously, you might get all the things at one time using your page system.

We tended not to link things to the page because they were re-used a lot between pages (and not just in page-fragments or controls), but that's certainly fine.

In the case of our Windows-native apps, we used reflection and a mapping file from control to translation tag so that translation did not require re-compiles (in pre-.NET apps we had to tag the controls using the Tag or other special properties). This is probably a little more problematic in PHP or ASP.NET MVC, but possible in ASP.NET where there is a full-blown server-side page model.

For testing, you can obviously query to find missing translations very easily. To find places which need to be tagged, translate the entire phrase dictionary using pig-latin or Klingon or something like replace every non-space character with ? - the English should stand out and let you know that some naked plaintext has crept into your HTML.