Sql-server – Trouble in Multi-Language DB

database-designsql server

I'm working on a DB design for medicines in many countries with different languages. I created a table for the different languages I want and link it to another table of chemicals. But I find that only one table will contain all medicines for all countries with all languages and the same applies for other tables linked to the language table. Ths means that the software in any country will contain unneccessary data about other countries and unneccessary overhead on the server.

How can I solve this design problem?

See my question to understand what I mean.

Best Answer

Previously I assumed that all users would connect to a central database. Here it sounds like you want to have a copy of the database local to each team.

The decision to have one or many instances of the database might affect which data is stored in each instance, but it should not affect the logical model that relates languages, chemicals, and the names of chemicals.

SIngle-instance solution

It should be clear that a central database would have to store all languages, all chemicals, and all names so that each team could use it. For example, the database would store a code for every chemical that is studied by each team, a code for every language spoken by each team, the name of every chemical in every language.

I assume that the database has a table like this to store the names of chemicals in each language:

CREATE TABLE ChemicalNames (
  ChemicalCode VARCHAR(12) NOT NULL,
  LanguageCode CHAR(2) NOT NULL,
  ChemicalName NVARCHAR(50) NOT NULL,
  CONSTRAINT PK_ChemicalNames PRIMARY KEY (ChemicalCode, LanguageCode)
);

INSERT INTO ChemicalNames (
  ChemicalCode,
  LanguageCode,
  ChemicalName
)
VALUES
  ('7732-18-5', 'en', 'Water'),
  ('7732-18-5', 'fr', 'Eau'),
  ('7732-18-5', 'es', 'Agua'),
  ('50-00-0', 'en', 'Formaldehyde'),
  ('50-00-0', 'fr', 'Méthanal'),
  ('50-00-0', 'es', 'Formaldehído'),
  ('8007-40-7', 'en', 'Mustard oil'),
  ('8007-40-7', 'fr', 'Huile de moutarde'),
  ('8007-40-7', 'es', 'Aceite de mostaza');

I assume that the user interface of the client application will display the names of chemicals. For a Spanish-speaker, the application would load the names of chemicals in this way:

SELECT ChemicalCode, ChemicalName
FROM ChemicalNames
WHERE LanguageCode = 'es';

For a French speaker, the application would load the names using the same query with a different language code:

SELECT ChemicalCode, ChemicalName
FROM ChemicalNames
WHERE LanguageCode = 'fr';

Multiple-instance solution

If you have multiple instances, you can simply treat each of them as if they were a central database. Each instance would have all the data needed by any team, even though only one team would use the instance.

The benefit of this approach is that you don't need to worry about which teams need which data - just give all the data to each of them.

They queries to extract data from the database would be the same. The American team would use the American database to get the English names. The French and Spanish names might not be useful to them, but their presence should do no harm.

You haven't made it clear what data you would consider to be 'unnecessary' in a multiple-instance solution. Do you want to limit the languages available to each user? Are you sure that you will never have a French-speaker working with the American team? Do you want to limit the chemicals available? Does each team work with a strict set of chemicals and never cooperates with other teams?

Can you demonstrate that the share-all approach would be impractical for the amount of data you have?

If you really have so much data, you could provide each team with an empty instance of the database and load only the data that each team requires. Load only the languages spoken by the team, load only the chemicals they study, and load only the names of the chemicals in the spoken languages.

Related Solutions

Mysql – Appropriate multi-language schema, or overkill

We've done a lot of this, and (administrative) users were allowed to fix the translations live. (You still might want a caching layer, but I'm totally down with driving this with a real database and not resource files - it gives you a ton of power to query and find things which need to be translated, etc). I think your schema is probably fine, so I'll just pass on some stuff we learned in the hope that it's useful.

One thing you have left out is phrases with insertion points. In the example below, the order is reversed and the language is still English, but this could very easily be two different languages - pretend this is just two languages who normally put things in a different order.

Hello, <username> you have <x> points!

You've got <x> points to spend, <username>!

In our pre-.NET, we had a routine which did the insertion so the phrases would look like this:

Hello, {0:username} you have {1:points} points!

You've got {1:points} points to spend, {0:username}!

This would obviously simply be used in your code as <%= String.Format(phrase, username, points); %> or similar

Which helped the translator a bit. But .NET String.FOrmat doesn't support commenting within the format string, unfortunately.

As you say, you would not want to handle that in your php with locale awareness or meta phrases.

So what we had was a master phrase table:

phraseid, english, supplemental info

and a localized table:

phraseid, localeid, translation

You've also assumed with INNER JOINS that the localized versions exist - we tended to leave them out until they were translated, so that query of yours would end up returning nothing at first (not even the default)

If a translation didn't exist, ours defaulted to English, then fellback to code-provided (in case the database didn't have the ID, and it was also clear from the code what phrase identifier "TXT_LNG_WRNNG_INV_LOW" was actually trying to get) - so the equivalent of this query is what we used:

SELECT COALESCE(localized.translation, phrase.english, @fallback)
FROM DUAL
LEFT JOIN phrase
    ON phrase.phraseid = @phraseid
LEFT JOIN localized
    ON localized.phraseid = phrase.phraseid
    AND localized.localeid = @localeid

Obviously, you might get all the things at one time using your page system.

We tended not to link things to the page because they were re-used a lot between pages (and not just in page-fragments or controls), but that's certainly fine.

In the case of our Windows-native apps, we used reflection and a mapping file from control to translation tag so that translation did not require re-compiles (in pre-.NET apps we had to tag the controls using the Tag or other special properties). This is probably a little more problematic in PHP or ASP.NET MVC, but possible in ASP.NET where there is a full-blown server-side page model.

For testing, you can obviously query to find missing translations very easily. To find places which need to be tagged, translate the entire phrase dictionary using pig-latin or Klingon or something like replace every non-space character with ? - the English should stand out and let you know that some naked plaintext has crept into your HTML.

Sql-server – Help with a multi-language medical database

No, you should not split this out into different databases. This will cause more issues and complexity further down the line, for reporting cross country, possibly requiring cross database ownership chaining, etc.

You can achieve your requirements by a process of normalization. Many people say they have normalised a database by looking at the required attributes and splitting them into separate entities by "feel". However, proper normalization is a much more formal, structured approach and is applicable for good database design. There is a very good Wikipedia article at http://en.m.wikipedia.org/wiki/Database_normalization and Itzik Ben-Gans book T-SQL Querying also goes into lots of detail with good examples.

Also recommend that you validate your normalization with a full ERD (Entity Relationship Diagram) Your normalization and ERD are also very useful design documents for your projects documentation library.