Postgresql – Query optimization with multi-column variant matching

performancepostgresqlquery-performance

TL;DR – I'm looking for advice on how to better write the query below.

Below is a pared down version of my table structure with some sample data. I don't have control over the data
structure at all so recommendations on schema changes unfortunately won't help me.

Problem

Given a building_level_key and a faction_key I need to return a record from building_levels joined to its
closest match from the building_culture_variants table.

For example, if I used goblin_walls & fact_blue I would expect the goblin_walls record that joins
building_culture_variant_key record 2.

An example structure of tables can be seen below:

Sample data with db structure

  • factions – is a compacted version of the real table as cultures/subculture records are stored in different
    tables but it gets the point across. This table is only really needed in the query so that the appropriate
    culture/subculture can be referenced in relation to a given faction_key.

  • building_levels – acts as a base record for every building in the system. There is only one record per
    building.

  • building_culture_variants – acts as its name implies; there can be more than one record for each building_level_key and each variant record is matched against a building level using the building_level_key and a combination of faction_key, culture_key and subculture_key.

How matching works

Matching starts with finding the given building_level_key in the culture variants table. This is a hard match and is needed to join any two building level and culture variant.

Each building level record will have at least one culture variant. Often there are several culture variants per building level but on average no more that 4. The most common culture variants is a "generic" one which means that the faction_key, culture_key and subculture_key columns are all null so the building will match against any faction. However any combination of the faction columns could have a key so I need to match a given faction against each of the faction columns in the culture variant.

Side note: the culture variant keys are always consistent, meaning I'll never have a scenario where a faction_key and subculture_key in the culture variants table don't match a corresponding faction_key and subculture_key from the factions table (and subculture table, which has been omitted for clarity).

What I've tried

I have provided a sql fiddle to play around with and included my version of the query below:

SELECT 
  "building_culture_variants"."building_culture_variant_key" AS qualified_key, 
  "building_levels"."building_level_key" AS building_key, 
  "building_levels"."create_time", 
  "building_levels"."create_cost", 
  "building_culture_variants"."name",
  'fact_blue'::text AS faction_key
FROM 
  "building_levels" 
  INNER JOIN "building_culture_variants" ON (
    "building_culture_variants"."building_culture_variant_key" IN (
      SELECT 
        "building_culture_variant_key" 
      FROM 
        (
          SELECT 
            "building_culture_variants"."building_culture_variant_key", 
            (
                CASE WHEN "building_culture_variants"."faction_key" = "building_factions"."faction_key" THEN 1 WHEN "building_culture_variants"."faction_key" IS NULL THEN 0 ELSE NULL END + 
                CASE WHEN "building_culture_variants"."culture_key" = "building_factions"."culture_key" THEN 1 WHEN "building_culture_variants"."culture_key" IS NULL THEN 0 ELSE NULL END + 
                CASE WHEN "building_culture_variants"."subculture_key" = "building_factions"."subculture_key" THEN 1 WHEN "building_culture_variants"."subculture_key" IS NULL THEN 0 ELSE NULL END
            ) AS match_count 
          FROM 
            "building_culture_variants" 
            INNER JOIN (
              -- This is a subquery because here I would join a couple more tables
              -- to collect all of the faction info
              SELECT 
                "factions"."faction_key", 
                "factions"."culture_key", 
                "factions"."subculture_key"
              FROM 
                "factions" 
            ) AS "building_factions" ON ("building_factions"."faction_key" = 'fact_blue')
          WHERE ("building_levels"."building_level_key" = "building_culture_variants"."building_level_key") 
          GROUP BY 
            match_count, 
            building_culture_variant_key 
          ORDER BY 
            match_count DESC NULLS LAST 
          LIMIT 
            1
        ) AS "culture_variant_match"
    )
  ) 
WHERE "building_levels"."building_level_key" = 'goblin_walls'
ORDER BY 
  "building_levels"."building_level_key"

Question

The query I provided above works and gets the job done but I feel like I'm trying to brute force the problem by just nesting a bunch of queries. I get this feeling like I'm not taking advantage of some sql construct that would streamline the performance of, or greatly simplify the query.

So what I'm really asking, is there a better way I could rewrite the query to be more efficient?

Best Answer

The SQL code almost looks like you are trying to do things in a Procedural fashion. That won't be efficient on a Declarative language like SQL.

JOIN

When you JOIN two data sets and you need to make one behave as if NULL value is a wild card, you can do the JOIN like this

Select *
from TableA
  join TableB
    on TableA.col2match = coalesce( TableB.col2match, TableA.col2match )

However, this trick will only work if you know that TableA.col2match is always NOT NULL.

Score and Rank

You have already provided a scoring function to score the matches. I advise that you place this into a function for easier maintenance.

Most databases can RANK() your scores for you. This is an analytic function. You really should read up on them.

CTE

I'm using CTEs to help the reader understand what is going on.

If you put the WHERE clause within the CTE section, PostgreSQL will materialize the least number of rows (according the the dbfidle plan).

Resulting SQL

I built this SQL so that you can move the WHERE clause outside of the CTEs. By doing that, you can CREATE VIEW on the SQL. This will simplifying the SQL that the middle tier developer has to write.

with "building_factions" as (
  -- This is a subquery because here I would join a couple more tables
  -- to collect all of the faction info
  SELECT 
    "factions"."faction_key", 
    "factions"."culture_key", 
    "factions"."subculture_key"
  FROM 
    "factions"
  where "factions"."faction_key" = 'fact_blue'
) , "building_info" as (
  select
    "building_culture_variants"."building_culture_variant_key", 
    "building_levels"."building_level_key", 
    "building_levels"."create_time", 
    "building_levels"."create_cost", 
    "building_culture_variants"."name",
    "building_culture_variants"."faction_key", 
    "building_culture_variants"."culture_key", 
    "building_culture_variants"."subculture_key"
  from "building_levels" 
    join "building_culture_variants"
      on "building_levels"."building_level_key" = "building_culture_variants"."building_level_key"
  where "building_levels"."building_level_key" = 'goblin_walls'
), "scoreRanked_data" as (
  select 
    "building_factions"."faction_key", 
    "building_factions"."culture_key", 
    "building_factions"."subculture_key",
    "building_info"."building_culture_variant_key", 
    "building_info"."building_level_key", 
    "building_info"."create_time", 
    "building_info"."create_cost", 
    "building_info"."name",
    rank() over (partition by "building_factions"."faction_key", 
                              "building_factions"."culture_key", 
                              "building_factions"."subculture_key",
                              "building_info"."building_level_key"
                 order by (
      CASE WHEN "building_info"."faction_key" = "building_factions"."faction_key" THEN 1
           WHEN "building_info"."faction_key" IS NULL THEN 0
           ELSE NULL
      END + 
      CASE WHEN "building_info"."culture_key" = "building_factions"."culture_key" THEN 1
           WHEN "building_info"."culture_key" IS NULL THEN 0
           ELSE NULL
      END + 
      CASE WHEN "building_info"."subculture_key" = "building_factions"."subculture_key" THEN 1
           WHEN "building_info"."subculture_key" IS NULL THEN 0
           ELSE NULL
      END
      ) desc nulls last ) match_rank
  from "building_factions"
    join "building_info"
      on    "building_factions"."faction_key" = coalesce( "building_info"."faction_key", "building_factions"."faction_key")
        and "building_factions"."culture_key" = coalesce( "building_info"."culture_key", "building_factions"."culture_key")
        and "building_factions"."subculture_key" = coalesce( "building_info"."subculture_key", "building_factions"."subculture_key")
)
select *
from "scoreRanked_data"
where match_rank = 1
limit 1