Sql-server – Select all records, join with table A if join exists, table B if not

sql serversql server 2014t-sql

So here's my scenario:

I'm working on Localization for a project of mine, and typically I would go about doing this in the C# code, however I want to do this in SQL a bit more since I am trying to buff up my SQL a bit.

Environment: SQL Server 2014 Standard, C# (.NET 4.5.1)

Note: the programming language itself should be irrelevant, I'm only including it for completeness.

So I sort-of accomplished what I wanted, but not to the extent I wanted. It's been a while (at least a year) since I have done any SQL JOINs except basic ones, and this is quite a complex JOIN.

Here is a diagramme of the relevant tables of the database. (There are plenty more, but not necessary for this portion.)

Database Diagramme

All relationships described in the image are complete in the database – the PK and FK constraints are all setup and operating. None of the columns described are nullable. All the tables have the schema dbo.

Now, I have a query which almost does what I want: that is, given ANY Id of SupportCategories and ANY Id of Languages, it will return either:

If there is a right-proper translation for that language for that string (I.e. StringKeyId -> StringKeys.Id exists, and in LanguageStringTranslations StringKeyId, LanguageId, and StringTranslationId combination exists, then it loads StringTranslations.Text for that StringTranslationId.

If the LanguageStringTranslations StringKeyId, LanguageId, and StringTranslationId combination did NOT exist, then it loads the StringKeys.Name value. The Languages.Id is a given integer.

My query, be it a mess, is as follows:

SELECT CASE WHEN T.x IS NOT NULL THEN T.x ELSE (SELECT
    CASE WHEN dbo.StringTranslations.Text IS NULL THEN dbo.StringKeys.Name ELSE dbo.StringTranslations.Text END AS Result
FROM dbo.SupportCategories
    INNER JOIN dbo.StringKeys
        ON dbo.SupportCategories.StringKeyId = dbo.StringKeys.Id
    INNER JOIN dbo.LanguageStringTranslations
        ON dbo.StringKeys.Id = dbo.LanguageStringTranslations.StringKeyId
    INNER JOIN dbo.StringTranslations
        ON dbo.StringTranslations.Id = dbo.LanguageStringTranslations.StringTranslationId
WHERE dbo.LanguageStringTranslations.LanguageId = 38 AND dbo.SupportCategories.Id = 0) END AS Result FROM (SELECT (SELECT
    CASE WHEN dbo.StringTranslations.Text IS NULL THEN dbo.StringKeys.Name ELSE dbo.StringTranslations.Text END AS Result
FROM dbo.SupportCategories
    INNER JOIN dbo.StringKeys
        ON dbo.SupportCategories.StringKeyId = dbo.StringKeys.Id
    INNER JOIN dbo.LanguageStringTranslations
        ON dbo.StringKeys.Id = dbo.LanguageStringTranslations.StringKeyId
    INNER JOIN dbo.StringTranslations
        ON dbo.StringTranslations.Id = dbo.LanguageStringTranslations.StringTranslationId
WHERE dbo.LanguageStringTranslations.LanguageId = 5 AND dbo.SupportCategories.Id = 0) AS x) AS T

The problem is that it is not capable of providing me ALL of the SupportCategories and their respective StringTranslations.Text if it exists, OR their StringKeys.Name if it didn't exist. It is perfect at providing any one of them, but not at all. Basically, it's to enforce that if a language does not have a translation for a specific key, then the default is to use StringKeys.Name which is of StringKeys.DefaultLanguageId translation. (Ideally, it would not even do that, but instead load the translation for StringKeys.DefaultLanguageId, which I can do myself if pointed in the right direction for the rest of the query.)

I've spent a LOT of time on this, and I know if I were to just write it in C# (like I usually do) it would be done by now. I want to do this in SQL, and I'm having trouble getting the output I like.

The only caveat, is I want to limit the number of actual queries applied. All columns are indexed and such as I like them for now, and without real stress-testing I cannot index them further.

Edit: Another note, I'm trying to keep the database as normalized as possible, so I don't want to duplicate things if I can avoid it.

Example Data

Source

dbo.SupportCategories (Entirety):

Id  StringKeyId
0   0
1   1
2   2

dbo.Languages (185 records, only showing two for examples):

Id  Abbreviation    Family  Name    Native
38  en  Indo-European   English English
48  fr  Indo-European   French  français, langue française

dbo.LanguagesStringTranslations (Entirety):

StringKeyId LanguageId  StringTranslationId
0   38  0
1   38  1
2   38  2
3   38  3
4   38  4
5   38  5
6   38  6
7   38  7
1   48  8 -- added as example

dbo.StringKeys (Entirety):

Id  Name    DefaultLanguageId
0   Billing 38
1   API 38
2   Sales   38
3   Open    38
4   Waiting for Customer    38
5   Waiting for Support 38
6   Work in Progress    38
7   Completed   38

dbo.StringTranslations (Entirety):

Id  Text
0   Billing
1   API
2   Sales
3   Open
4   Waiting for Customer
5   Waiting for Support
6   Work in Progress
7   Completed
8   Les APIs -- added as example

Current Output

Given the exact query below, it outputs:

Result
Billing

Desired Output

Ideally, I would like to be able to omit the specific SupportCategories.Id, and get all of them, as so (regardless if language 38 English was used, or 48 French, or ANY other language at the moment):

Id  Result
0   Billing
1   API
2   Sales

Additional Example

Given I were to add a localization for French (I.e. add 1 48 8 to LanguageStringTranslations), the output would change to (note: this is example only, obviously I would add a localized string to StringTranslations) (updated with French example):

Result
Les APIs

Additional Desired Output

Given the example above, the following output would be desired (updated with French example):

Id  Result
0   Billing
1   Les APIs
2   Sales

(Yes, I know technically that's wrong from a consistency standpoint, but it's what would be desired in the situation.)

Edit:

Small updated, I did change the structure of the dbo.Languages table, and drop the Id (int) column from it, and replace it with Abbreviation (which is now renamed to Id, and all relative Foreign-Keys and and relationships updated). From a technical standpoint, this is a more appropriate setup in my opinion due to the fact that the table is limited to ISO 639-1 codes, which are unique to begin with.

Tl;dr

So: the question, how could I modify this query to return everything from SupportCategories and then return either StringTranslations.Text for that StringKeys.Id, Languages.Id combination, or the StringKeys.Name if it did NOT exist?

My initial thought, is that I could somehow cast the current query to another temporary type as another subquery, and wrap this query in yet another SELECT statement and select the two fields I want (SupportCategories.Id and Result).

If I don't find anything, I'll just do the standard method I typically use which is to load all the SupportCategories into my C# project, and then with it run the query I have above manually against each SupportCategories.Id.

Thanks for any and all suggestions/comments/critique.

Also, I apologize for it being absurdly long, I just don't want any ambiguity. I'm often on StackOverflow and see questions that lack substance, didn't wish to make that mistake here.

Best Answer

Here is the first approach I came up with:

DECLARE @ChosenLanguage INT = 48;

SELECT sc.Id, Result = MAX(COALESCE(
   CASE WHEN lst.LanguageId = @ChosenLanguage      THEN st.Text END,
   CASE WHEN lst.LanguageId = sk.DefaultLanguageId THEN st.Text END)
)
FROM dbo.SupportCategories AS sc
INNER JOIN dbo.StringKeys AS sk
  ON sc.StringKeyId = sk.Id
LEFT OUTER JOIN dbo.LanguageStringTranslations AS lst
  ON sk.Id = lst.StringKeyId
  AND lst.LanguageId IN (sk.DefaultLanguageId, @ChosenLanguage)
LEFT OUTER JOIN dbo.StringTranslations AS st
  ON st.Id = lst.StringTranslationId
  --WHERE sc.Id = 1
  GROUP BY sc.Id
  ORDER BY sc.Id;

Basically, get the potential strings that match the chosen language and get all the default strings, then aggregate so you only pick one per Id - prioritize on the chosen language, then take the default as a fallback.

You can probably do similar things with UNION/EXCEPT but I suspect this will almost always lead to multiple scans against the same objects.