Sql-server – Help with a multi-language medical database

database-designsql server

My project is to design a database for medicines in many countries.

Medicines have various criteria like composition, price, manufacturer.
Database will be multi-language; related to each country.
Database will increase in size over time.
Database will have more than just other data on medicines so it will become more complex over time.

Should I make a separate database for each country?

I use SQL Server 2008.

Best Answer

No, you should not split this out into different databases. This will cause more issues and complexity further down the line, for reporting cross country, possibly requiring cross database ownership chaining, etc.

You can achieve your requirements by a process of normalization. Many people say they have normalised a database by looking at the required attributes and splitting them into separate entities by "feel". However, proper normalization is a much more formal, structured approach and is applicable for good database design. There is a very good Wikipedia article at http://en.m.wikipedia.org/wiki/Database_normalization and Itzik Ben-Gans book T-SQL Querying also goes into lots of detail with good examples.

Also recommend that you validate your normalization with a full ERD (Entity Relationship Diagram) Your normalization and ERD are also very useful design documents for your projects documentation library.

Define the requirements

I can give you an example of how to model multilingual name data by assuming these business requirements:

The database should store the names of the ingredients of drugs in English, French and Spanish, and more languages yet to be defined.
Each ingredient is known by one name in each langauge.

Identify the languages

You need a way to identify langauges in your database so that you can associate drug names with a particular langauge.

A simple standard for identifying languages is ISO 619-1, which defines a two-letter code for the world's most common languages. For example, the code for English is en, French is fr, and Spanish is es.

The following SQL creates a table with a language code column, then populates the table with the example languages:

CREATE TABLE Languages (
  LanguageCode CHAR(2) NOT NULL,
  CONSTRAINT PK_Languages PRIMARY KEY (LanguageCode),
  CONSTRAINT CK_LowerAlpha CHECK (LanguageCode NOT LIKE '%[^a-z]%' COLLATE Latin1_General_BIN2)
);

INSERT INTO Languages (
  LanguageCode
)
VALUES
  ('en'),
  ('fr'),
  ('es');

There is nothing in your entity model that corresponds to this table, which could explain why you found it difficult to model multiligual data.

Identify the chemicals

Next you need a way to identify the chemicals that compose each drug.

A common way to identify chemicals is CAS registry numbering, which defines a numerical code for every chemical known to Science. For example, the code for water is 7732-18-5, formaldehyde is 50-00-0, and mustard oil is 8007-40-7.

The following SQL creates a table with a chemical code column, then populates the table with the example chemicals:

CREATE TABLE Chemicals (
  ChemicalCode VARCHAR(12) NOT NULL,
  CONSTRAINT PK_Chemicals PRIMARY KEY (ChemicalCode),
  CONSTRAINT CK_Dashes CHECK (ChemicalCode LIKE '%_-__-_'),
  CONSTRAINT CK_Digits CHECK (REPLACE(ChemicalCode, '-', '') NOT LIKE '%[^0-9]%')
);

INSERT INTO Chemicals (
  ChemicalCode
)
VALUES
  ('7732-18-5'),
  ('50-00-0'),
  ('8007-40-7');

The Chemicals table roughly corresponds to the Drugs_Ingredients entity in your diagram, but lacks a column for the name.

For simplicity, the table does not enforce the checksum constraint. You should enforce the checksum constraint before using CAS registry numbers in production.

Identify the chemical name

If you were to store the name in a column in the Chemicals table, it would be difficult to support more than one language. You could have one column per langauge, but you would have to modify the schema every time you wanted to support a new langauge.

To free yourself from these limitations, you should model the chemical name as an identifiable entity in itself. You can identify the chemical name using a compound identity of the chemical code and the language code.

The following SQL creates a table with columns for the chemical code, the language code, and the checmical name, then populates the table with one name in each language for each chemical:

CREATE TABLE ChemicalNames (
  ChemicalCode VARCHAR(12) NOT NULL,
  LanguageCode CHAR(2) NOT NULL,
  ChemicalName NVARCHAR(50) NOT NULL,
  CONSTRAINT PK_ChemicalNames PRIMARY KEY (ChemicalCode, LanguageCode),
  CONSTRAINT FK_ChemicalNames_ChemicalCode FOREIGN KEY (ChemicalCode) REFERENCES Chemicals(ChemicalCode),
  CONSTRAINT FK_ChemicalNames_LanguageCode FOREIGN KEY (LanguageCode) REFERENCES Languages(LanguageCode)
);

INSERT INTO ChemicalNames (
  ChemicalCode,
  LanguageCode,
  ChemicalName
)
VALUES
  ('7732-18-5', 'en', 'Water'),
  ('7732-18-5', 'fr', 'Eau'),
  ('7732-18-5', 'es', 'Agua'),
  ('50-00-0', 'en', 'Formaldehyde'),
  ('50-00-0', 'fr', 'Méthanal'),
  ('50-00-0', 'es', 'Formaldehído'),
  ('8007-40-7', 'en', 'Mustard oil'),
  ('8007-40-7', 'fr', 'Huile de moutarde'),
  ('8007-40-7', 'es', 'Aceite de mostaza');

The ChemicalName column is of type NVARCHAR so that it can store any Unicode string. You should always use Unicode in international applications to store text so that you can handle the writing systems of different langauges in a uniform way.

Query the chemical names

I can think of two questions that you might reasonably ask of your database. With the ChemicalNames table, you can answer both of them with simple queries.

What are the names of all the chemicals in Spanish?

Query:

SELECT ChemicalCode, ChemicalName
FROM ChemicalNames
WHERE LanguageCode = 'es';

Result:

CHEMICALCODE    CHEMICALNAME
50-00-0 Formaldehído
7732-18-5   Agua
8007-40-7   Aceite de mostaza

What is water called in each langauge?

Query:

SELECT LanguageCode, ChemicalName
FROM ChemicalNames
WHERE ChemicalCode = '7732-18-5';

Result:

LANGUAGECODE    CHEMICALNAME
en  Water
es  Agua
fr  Eau

Best Answer

Related Solutions

Postgresql – How to design a database for continents, countries, regions, cities and POIs

SQL Server – Designing a Multi-Language Medical Database

Define the requirements

Identify the languages

Identify the chemicals

Identify the chemical name

Query the chemical names

Related Question