Sql-server – Trouble in Multi-Language DB

database-designsql server

I'm working on a DB design for medicines in many countries with different languages. I created a table for the different languages I want and link it to another table of chemicals. But I find that only one table will contain all medicines for all countries with all languages and the same applies for other tables linked to the language table. Ths means that the software in any country will contain unneccessary data about other countries and unneccessary overhead on the server.

How can I solve this design problem?

See my question to understand what I mean.

Best Answer

Previously I assumed that all users would connect to a central database. Here it sounds like you want to have a copy of the database local to each team.

The decision to have one or many instances of the database might affect which data is stored in each instance, but it should not affect the logical model that relates languages, chemicals, and the names of chemicals.

SIngle-instance solution

It should be clear that a central database would have to store all languages, all chemicals, and all names so that each team could use it. For example, the database would store a code for every chemical that is studied by each team, a code for every language spoken by each team, the name of every chemical in every language.

I assume that the database has a table like this to store the names of chemicals in each language:

CREATE TABLE ChemicalNames (
  ChemicalCode VARCHAR(12) NOT NULL,
  LanguageCode CHAR(2) NOT NULL,
  ChemicalName NVARCHAR(50) NOT NULL,
  CONSTRAINT PK_ChemicalNames PRIMARY KEY (ChemicalCode, LanguageCode)
);

INSERT INTO ChemicalNames (
  ChemicalCode,
  LanguageCode,
  ChemicalName
)
VALUES
  ('7732-18-5', 'en', 'Water'),
  ('7732-18-5', 'fr', 'Eau'),
  ('7732-18-5', 'es', 'Agua'),
  ('50-00-0', 'en', 'Formaldehyde'),
  ('50-00-0', 'fr', 'MĂ©thanal'),
  ('50-00-0', 'es', 'FormaldehĂ­do'),
  ('8007-40-7', 'en', 'Mustard oil'),
  ('8007-40-7', 'fr', 'Huile de moutarde'),
  ('8007-40-7', 'es', 'Aceite de mostaza');

I assume that the user interface of the client application will display the names of chemicals. For a Spanish-speaker, the application would load the names of chemicals in this way:

SELECT ChemicalCode, ChemicalName
FROM ChemicalNames
WHERE LanguageCode = 'es';

For a French speaker, the application would load the names using the same query with a different language code:

SELECT ChemicalCode, ChemicalName
FROM ChemicalNames
WHERE LanguageCode = 'fr';

Multiple-instance solution

If you have multiple instances, you can simply treat each of them as if they were a central database. Each instance would have all the data needed by any team, even though only one team would use the instance.

The benefit of this approach is that you don't need to worry about which teams need which data - just give all the data to each of them.

They queries to extract data from the database would be the same. The American team would use the American database to get the English names. The French and Spanish names might not be useful to them, but their presence should do no harm.

You haven't made it clear what data you would consider to be 'unnecessary' in a multiple-instance solution. Do you want to limit the languages available to each user? Are you sure that you will never have a French-speaker working with the American team? Do you want to limit the chemicals available? Does each team work with a strict set of chemicals and never cooperates with other teams?

Can you demonstrate that the share-all approach would be impractical for the amount of data you have?

If you really have so much data, you could provide each team with an empty instance of the database and load only the data that each team requires. Load only the languages spoken by the team, load only the chemicals they study, and load only the names of the chemicals in the spoken languages.