TL;DR – I'm looking for advice on how to better write the query below.
Below is a pared down version of my table structure with some sample data. I don't have control over the data
structure at all so recommendations on schema changes unfortunately won't help me.
Problem
Given a building_level_key
and a faction_key
I need to return a record from building_levels
joined to its
closest match from the building_culture_variants
table.
For example, if I used goblin_walls
& fact_blue
I would expect the goblin_walls
record that joins
building_culture_variant_key
record 2
.
An example structure of tables can be seen below:
-
factions
– is a compacted version of the real table as cultures/subculture records are stored in different
tables but it gets the point across. This table is only really needed in the query so that the appropriate
culture/subculture can be referenced in relation to a givenfaction_key
. -
building_levels
– acts as a base record for every building in the system. There is only one record per
building. -
building_culture_variants
– acts as its name implies; there can be more than one record for eachbuilding_level_key
and each variant record is matched against a building level using thebuilding_level_key
and a combination offaction_key
,culture_key
andsubculture_key
.
How matching works
Matching starts with finding the given building_level_key
in the culture variants table. This is a hard match and is needed to join any two building level and culture variant.
Each building level record will have at least one culture variant. Often there are several culture variants per building level but on average no more that 4. The most common culture variants is a "generic" one which means that the faction_key
, culture_key
and subculture_key
columns are all null so the building will match against any faction. However any combination of the faction columns could have a key so I need to match a given faction against each of the faction columns in the culture variant.
Side note: the culture variant keys are always consistent, meaning I'll never have a scenario where a faction_key
and subculture_key
in the culture variants table don't match a corresponding faction_key
and subculture_key
from the factions table (and subculture table, which has been omitted for clarity).
What I've tried
I have provided a sql fiddle to play around with and included my version of the query below:
SELECT
"building_culture_variants"."building_culture_variant_key" AS qualified_key,
"building_levels"."building_level_key" AS building_key,
"building_levels"."create_time",
"building_levels"."create_cost",
"building_culture_variants"."name",
'fact_blue'::text AS faction_key
FROM
"building_levels"
INNER JOIN "building_culture_variants" ON (
"building_culture_variants"."building_culture_variant_key" IN (
SELECT
"building_culture_variant_key"
FROM
(
SELECT
"building_culture_variants"."building_culture_variant_key",
(
CASE WHEN "building_culture_variants"."faction_key" = "building_factions"."faction_key" THEN 1 WHEN "building_culture_variants"."faction_key" IS NULL THEN 0 ELSE NULL END +
CASE WHEN "building_culture_variants"."culture_key" = "building_factions"."culture_key" THEN 1 WHEN "building_culture_variants"."culture_key" IS NULL THEN 0 ELSE NULL END +
CASE WHEN "building_culture_variants"."subculture_key" = "building_factions"."subculture_key" THEN 1 WHEN "building_culture_variants"."subculture_key" IS NULL THEN 0 ELSE NULL END
) AS match_count
FROM
"building_culture_variants"
INNER JOIN (
-- This is a subquery because here I would join a couple more tables
-- to collect all of the faction info
SELECT
"factions"."faction_key",
"factions"."culture_key",
"factions"."subculture_key"
FROM
"factions"
) AS "building_factions" ON ("building_factions"."faction_key" = 'fact_blue')
WHERE ("building_levels"."building_level_key" = "building_culture_variants"."building_level_key")
GROUP BY
match_count,
building_culture_variant_key
ORDER BY
match_count DESC NULLS LAST
LIMIT
1
) AS "culture_variant_match"
)
)
WHERE "building_levels"."building_level_key" = 'goblin_walls'
ORDER BY
"building_levels"."building_level_key"
Question
The query I provided above works and gets the job done but I feel like I'm trying to brute force the problem by just nesting a bunch of queries. I get this feeling like I'm not taking advantage of some sql construct that would streamline the performance of, or greatly simplify the query.
So what I'm really asking, is there a better way I could rewrite the query to be more efficient?
Best Answer
The SQL code almost looks like you are trying to do things in a Procedural fashion. That won't be efficient on a Declarative language like SQL.
JOIN
When you
JOIN
two data sets and you need to make one behave as ifNULL
value is a wild card, you can do theJOIN
like thisHowever, this trick will only work if you know that
TableA.col2match
is alwaysNOT NULL
.Score and Rank
You have already provided a scoring function to score the matches. I advise that you place this into a function for easier maintenance.
Most databases can
RANK()
your scores for you. This is ananalytic
function. You really should read up on them.CTE
I'm using CTEs to help the reader understand what is going on.
If you put the
WHERE
clause within the CTE section, PostgreSQL will materialize the least number of rows (according the the dbfidle plan).Resulting SQL
I built this SQL so that you can move the
WHERE
clause outside of the CTEs. By doing that, you canCREATE VIEW
on the SQL. This will simplifying the SQL that the middle tier developer has to write.