PostgreSQL conditional join and value distribution with materialized view

join;materialized-viewpostgresql

So I essentially have two tables in a database. One which is the base table of all of our media data, and another that has potential override data for spend. Not everything in the display table will need to get overridden, but some rows will. Also, the only grouping is done at the date AND pid level, and does not include creative. So if there are date AND pid combinations that have more than one creative, the spend will need to get distributed based on the proportion of impressions for that date AND pid combination. I would like to create a view that does this for me, but I'm not quite sure where to begin here or if it's even possible. I have a python script that can do the work, but I wanted to see if it was possible with just SQL. SQL Fiddle with basic schema and data

display table:

date    pid   creative  impressions  spend
1/1/16  1234  a         10           $100
1/1/16  1234  b         20           $200
1/1/16  1235  a         20           $300
1/1/16  1236  a         20           $300
1/2/16  1234  b         15           $150
1/2/16  1235  a         20           $200
1/2/16  1236  a         10           $150
1/2/16  1236  b         20           $200
1/2/16  1237  a         10           $100
1/2/16  1237  b         15           $150

spend table:

date    pid     spend
1/1/16  1234    $400
1/1/16  1235    $600
1/2/16  1235    $1,200
1/2/16  1236    $500
1/2/16  1237    $1,000

expected output:

date    pid     creative    impressions spend
1/1/16  1234    a           10          $133.33
1/1/16  1234    b           20          $266.67
1/1/16  1235    a           20          $600
1/1/16  1236    a           20          $300
1/2/16  1234    b           15          $150
1/2/16  1235    a           20          $1,200
1/2/16  1236    a           10          $166.67
1/2/16  1236    b           20          $333.33
1/2/16  1237    a           10          $400
1/2/16  1237    b           15          $600

Answer

SELECT
  D.date,
  D.pid,
  D.creative,
  D.impressions,
  COALESCE(S.spend * D.impressions / SUM(D.impressions)
  OVER (PARTITION BY D.date, D.pid), D.spend) AS "spend"
FROM display D
LEFT JOIN spend S
  ON D.date = S.date
  AND D.pid = S.pid

So I finally got around to having a good chunk of real world data to play with and this is giving me issues. This:

SELECT D.rep_date, D.placement_id, D.creative
  , COALESCE(S.media_cost * D.impressions /
       SUM(D.impressions) OVER (PARTITION BY D.rep_date, D.placement_id)
    , D.media_cost
    ) AS  "spend"
FROM dc3.display D
LEFT JOIN dc3.display_spend S ON D.rep_date = S.rep_date AND D.placement_id = S.placement_id

returns exactly half of what I would expect for spend. This:

SELECT D.rep_date, D.placement_id, D.creative
  , SUM(COALESCE(S.media_cost * D.impressions / 
      (SELECT SUM(D2.impressions)
       FROM dc3.display D2
       WHERE D.rep_date = D2.rep_date AND D.placement_id = D2.placement_id)
    , D.media_cost
    )) AS  "spend"
FROM dc3.display D
LEFT JOIN dc3.display_spend S ON D.rep_date = S.rep_date AND D.placement_id = S.placement_id
GROUP BY 1, 2, 3

returns exactly what I would expect, but takes a long time to run. Any thoughts on what the issue could be?

Best Answer

A Window Function can be used (SUM): Display.impressions * Spend.spend / SUM(...) OVER(...)

Query:

SELECT D.date, D.pid, D.creative
    , COALESCE(
        D.impressions * S.spend 
        / SUM(D.impressions) OVER(PARTITION BY D.date, D.pid)
      , D.spend
    )
FROM display D 
LEFT JOIN spend S ON D.date = S.date AND D.pid = S.pid;

SQL Fiddle

Output:

date                        pid     creative    spend
January, 01 2016 00:00:00   1234    a           133.33
January, 01 2016 00:00:00   1234    b           266.66
January, 01 2016 00:00:00   1235    a           600
January, 01 2016 00:00:00   1236    a           300
January, 02 2016 00:00:00   1234    b           150
January, 02 2016 00:00:00   1235    a           1200
January, 02 2016 00:00:00   1236    a           166.66
January, 02 2016 00:00:00   1236    b           333.33
January, 02 2016 00:00:00   1237    a           400
January, 02 2016 00:00:00   1237    b           600