So I essentially have two tables in a database. One which is the base table of all of our media data, and another that has potential override data for spend. Not everything in the display table will need to get overridden, but some rows will. Also, the only grouping is done at the date AND pid level, and does not include creative. So if there are date AND pid combinations that have more than one creative, the spend will need to get distributed based on the proportion of impressions for that date AND pid combination. I would like to create a view that does this for me, but I'm not quite sure where to begin here or if it's even possible. I have a python script that can do the work, but I wanted to see if it was possible with just SQL. SQL Fiddle with basic schema and data
display table:
date pid creative impressions spend
1/1/16 1234 a 10 $100
1/1/16 1234 b 20 $200
1/1/16 1235 a 20 $300
1/1/16 1236 a 20 $300
1/2/16 1234 b 15 $150
1/2/16 1235 a 20 $200
1/2/16 1236 a 10 $150
1/2/16 1236 b 20 $200
1/2/16 1237 a 10 $100
1/2/16 1237 b 15 $150
spend table:
date pid spend
1/1/16 1234 $400
1/1/16 1235 $600
1/2/16 1235 $1,200
1/2/16 1236 $500
1/2/16 1237 $1,000
expected output:
date pid creative impressions spend
1/1/16 1234 a 10 $133.33
1/1/16 1234 b 20 $266.67
1/1/16 1235 a 20 $600
1/1/16 1236 a 20 $300
1/2/16 1234 b 15 $150
1/2/16 1235 a 20 $1,200
1/2/16 1236 a 10 $166.67
1/2/16 1236 b 20 $333.33
1/2/16 1237 a 10 $400
1/2/16 1237 b 15 $600
Answer
SELECT
D.date,
D.pid,
D.creative,
D.impressions,
COALESCE(S.spend * D.impressions / SUM(D.impressions)
OVER (PARTITION BY D.date, D.pid), D.spend) AS "spend"
FROM display D
LEFT JOIN spend S
ON D.date = S.date
AND D.pid = S.pid
So I finally got around to having a good chunk of real world data to play with and this is giving me issues. This:
SELECT D.rep_date, D.placement_id, D.creative
, COALESCE(S.media_cost * D.impressions /
SUM(D.impressions) OVER (PARTITION BY D.rep_date, D.placement_id)
, D.media_cost
) AS "spend"
FROM dc3.display D
LEFT JOIN dc3.display_spend S ON D.rep_date = S.rep_date AND D.placement_id = S.placement_id
returns exactly half of what I would expect for spend. This:
SELECT D.rep_date, D.placement_id, D.creative
, SUM(COALESCE(S.media_cost * D.impressions /
(SELECT SUM(D2.impressions)
FROM dc3.display D2
WHERE D.rep_date = D2.rep_date AND D.placement_id = D2.placement_id)
, D.media_cost
)) AS "spend"
FROM dc3.display D
LEFT JOIN dc3.display_spend S ON D.rep_date = S.rep_date AND D.placement_id = S.placement_id
GROUP BY 1, 2, 3
returns exactly what I would expect, but takes a long time to run. Any thoughts on what the issue could be?
Best Answer
A Window Function can be used (
SUM
):Display.impressions * Spend.spend / SUM(...) OVER(...)
Query:
SQL Fiddle
Output: