If I understand your requirements correctly....
I would just use the natural key, LanguageCode-CultureCode ("en-US," for example). It's small enough. (I'm using the entire "en-US" as the primary key to differentiate it from "en-GB," for example.)
CREATE TABLE [dbo].[Language](
[Language] [char](2) NOT NULL,
[Culture] [char](2) NOT NULL,
[LanguageCode] AS (([Language]+'-')+[Culture]) PERSISTED NOT NULL,
CONSTRAINT [PK_Language] PRIMARY KEY CLUSTERED
(
[Language] ASC,
[Culture] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [LanguageCode] UNIQUE NONCLUSTERED
(
[LanguageCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Language_Text](
[LanguageID] [varchar](5) NOT NULL,
[LanguageCode] [varchar](5) NOT NULL,
[LanguageName] [nvarchar](20) NULL,
CONSTRAINT [PK_Language_Text] PRIMARY KEY CLUSTERED
(
[LanguageID] ASC,
[LanguageCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Language_Text] WITH CHECK ADD CONSTRAINT [FK_Language_Text_Language] FOREIGN KEY([LanguageCode])
REFERENCES [dbo].[Language] ([LanguageCode])
GO
ALTER TABLE [dbo].[Language_Text] CHECK CONSTRAINT [FK_Language_Text_Language]
GO
This should allow you to get all articles in en or sv, and you can also query for en-US or sv-SE. Presumably, although it wasn't in your example, you could also query for en-CA, fr-CA, en, fr, or CA.
Edit--I'm sorry, you're right, no search by Culture in my old code. Revamped above, sorry. Here's a sample of content:
Language:
Language-Culture-LanguageCode
en US en-US
sv SE sv-SE
Language_Text:
LanguageID-LanguageCode-LanguageName
en-US en-US English
en-US sv-SE Engelska
sv-SE en-US Swedish
sv-SE sv-SE Svenska
Searching by Culture (Canada):
SELECT test.dbo.Language_Text.LanguageID, test.dbo.Language_Text.LanguageCode, test.dbo.Language_Text.LanguageName
FROM test.dbo.Language_Text INNER JOIN
test.dbo.Language ON test.dbo.Language_Text.LanguageID = test.dbo.Language.LanguageCode
WHERE (test.dbo.Language.Culture = 'CA')
Searching by Language (French):
SELECT test.dbo.Language_Text.LanguageID, test.dbo.Language_Text.LanguageCode, test.dbo.Language_Text.LanguageName
FROM test.dbo.Language_Text INNER JOIN
test.dbo.Language ON test.dbo.Language_Text.LanguageID = test.dbo.Language.LanguageCode
WHERE (test.dbo.Language.Language = 'fr')
Searching by LanguageCode (Swedish):
SELECT LanguageName
FROM [test].[dbo].[Language_Text]
where (LanguageID = 'sv-SE')
Pattern matching and operators
Full text search is not the right tool for pattern matching (and possibly even fuzzy, fault tolerant input). Typically, trigram-similarity search with the %
operator is the superior approach here. You need to install the additional module pg_trgm
once per database:
Possibly combine that with LIKE
(also supported by trigram indexes), and text search like you had in mind. Related answer where text search and similarity are combined:
Overview of pattern-matching functionality in Postgres:
Database design
json
is a very bad choice for your intended functionality. It is very inefficient to have to plow through huge columns holding everything in a semi-structured big string (json
).
Consider normalizing your schema, which would result in superior performance (when done properly).
If you are hellbent on json
, consider the new jsonb
in the upcoming (and soon to be released) Postgres 9.4. More (last paragraph):
And be sure to use functional indexes like:
Further optimization is likely possible. But that would require more detailed information including table definitions, cardinalities, sample data and typical queries.
Best Answer
First of all, you have an integer primary key on
principal_country_divisions
. Use it. More efficient than joining vianame
for multiple reasons (storage size, index size, faster integer arithmetic, no collations involved, fixed length).syn
needs to the the first column of the index (pk), you had that right already. The accompanying index automatically covers equality tests onsynonyms.syn
.Be sure to add an index on
principal_country_divisions.name
:If you'd want to match patterns, not whole strings, the job would become more complex.
Next, how can you be sure to
Obviously,
name
andsyn
can be the same. There is no unique constraint over both columns and there isn't even one onsyn
alone. Otherwise yourEXISTS
query is a good approach - usually fast. You'd just have to avoid multiple rows. The added benefit ofEXISTS
would be to eliminate duplicates fromsynonyms
alone, but that's ruled out by the pk. This may be faster for the case:As you commented, a
LEFT JOIN
is in order to preserve finds inname
.In case of multiple finds, you can chose what to pick by adding more
ORDER BY
expressions. Leading columns have to agree withDISTINCT ON
, though. Details in this related answer on SO.