If I guessed the missing pieces correctly, this query using a crosstab()
function should do the job (and much faster than the original - except for small cardinalities where not much can be gained to begin with):
It requires that you install the additional module tablefunc
first (once per database. Detailed instructions in this related answer on SO:
PostgreSQL Crosstab Query
SELECT name, project_id, positions, created_day
,yesterday, last_week, last_month, three_month, six_month, last_year
FROM (
SELECT k.name, pr.keyword_id, pr.project_id, pr.created_at::date AS created_day
, array_agg(DISTINCT pr.position) AS positions
FROM project_reports pr
JOIN keywords k ON k.id = pr.keyword_id
GROUP BY k.id, pr.project_id, created_day -- k.id being the pk
) sub
LEFT JOIN crosstab(
$$SELECT ARRAY[keyword_id, project_id], created_at::date, array_agg(DISTINCT position)
FROM project_reports
WHERE created_at::date IN (
current_date - 1
, current_date - 7
, (now() - interval '1 mon')::date
, (now() - interval '3 mon')::date
, (now() - interval '6 mon')::date
, (now() - interval '12 mon')::date
)
GROUP BY 1,2
ORDER BY 1,2$$
,$$VALUES
(current_date - 1)
,(current_date - 7)
,((now() - interval '1 mon')::date)
,((now() - interval '3 mon')::date)
,((now() - interval '6 mon')::date)
,((now() - interval '12 mon')::date)$$
) AS t (
kp int[]
, yesterday int[]
, last_week int[]
, last_month int[]
, three_month int[]
, six_month int[]
, last_year int[]
) ON t.kp[1] = keyword_id
AND t.kp[2] = project_id;
To break it down, a simple demo of what I am doing in the crosstab:
SELECT * FROM crosstab(
$$SELECT *
FROM (
VALUES
('{1,2}'::int[], current_date - 1, '{1,2}'::int[])
, ('{1,3}'::int[], current_date - 1, '{1,2}'::int[])
, ('{1,2}', current_date - 7, '{5,6,7}'::int[])
, ('{1,3}', current_date - 7, '{5,6,7}'::int[])
, ('{1,2}', (now() - interval '1 mon')::date, '{9,10}'::int[])
-- 3 mon missing
, ('{1,2}', (now() - interval '6 mon')::date, '{6,6,6}'::int[])
-- 12 mon missing
) sub
ORDER BY 1,2$$
,$$VALUES
(current_date - 1)
,(current_date - 7)
,((now() - interval '1 mon')::date)
,((now() - interval '3 mon')::date)
,((now() - interval '6 mon')::date)
,((now() - interval '12 mon')::date)$$
) AS t (
kp int[]
, yesterday int[]
, last_week int[]
, last_month int[]
, three_month int[]
, six_month int[]
, last_year int[]
);
Result:
kp | yesterday | last_week | last_month | three_month | six_month | last_year
-------+-----------+-----------+------------+-------------+-----------+-----------
{1,2} | {1,2} | {5,6,7} | {9,10} | | {6,6,6} |
{1,3} | {1,2} | {5,6,7} | | | |
The rest should be obvious.
Comments
I de-coupled the two parts and fused the plethora of correlated subqueries into a single subquery, which should be much faster. Then I use cross tabulation pull individual columns from this query. The syntax for crosstab()
is the tricky part.
I am not joining to project_keywords
, it seems to be just noise. I am suspecting problems in your relational model.
I seems you should include the project_id
in the result to make sense. Else you should not GROUP BY it ...
You can just subtract integer
from date
.
Note the slightly modified schema of my test in the fiddle.
Using actual primary keys and proper column names instead of id
.
Also, you seem to be operating with dates exclusively. So I suggest to convert your timestamp
columns to date
.
Items 1 and 2
SELECT k.keyword_id
, k.name
, pr.project_id
, COALESCE(min(pr.position), 0) AS pos
, COALESCE(pr.created_at, now()::date) AS created_at
FROM keyword k
LEFT JOIN project_report pr USING (keyword_id)
GROUP BY k.keyword_id, pr.project_id, pr.created_at
ORDER BY keyword_id, created_at
;
- In Postgres 9.1 or later the pk column covers the whole table in
GROUP BY
.
- Use
COALESCE
to replace possible NULL
values.
A guess at item 3
WITH cte AS (
SELECT k.keyword_id
, k.name
, pr.project_id
, COALESCE(min(pr.position), 0) AS pos
, COALESCE(pr.created_at, now()::date) AS created_at
FROM keyword k
LEFT JOIN project_report pr USING (keyword_id)
GROUP BY k.keyword_id, pr.project_id, pr.created_at
)
, x AS (
SELECT DISTINCT ON (keyword_id, project_id) *
FROM cte
ORDER BY keyword_id, project_id, created_at DESC
)
SELECT x.*
, COALESCE(y.pos, 0) AS yesterday_pos
, COALESCE(w.pos, 0) AS week_pos
, COALESCE(m.pos, 0) AS month_pos
FROM x
LEFT JOIN cte y ON y.keyword_id = x.keyword_id
AND y.project_id = x.project_id
AND y.created_at = x.created_at - interval '1 day'
LEFT JOIN cte w ON w.keyword_id = x.keyword_id
AND w.project_id = x.project_id
AND w.created_at = x.created_at - interval '1 week'
LEFT JOIN cte m ON m.keyword_id = x.keyword_id
AND m.project_id = x.project_id
AND m.created_at = x.created_at - interval '1 month'
;
Explain
- In CTE
cte
produce daily aggregates per (k.keyword_id, pr.project_id)
.
- In CTE
x
pick the latest day per (k.keyword_id, pr.project_id)
.
- In the outer query
LEFT JOIN
the latest day x
to cte
multiple times to retrieve past values from the same (k.keyword_id, pr.project_id)
for 1 day / week / month earlier.
SQL Fiddle.
Best Answer
If you want the distinct keywords then you need to count the
keyword_id
, not theposition
column.So your expression:
becomes:
So the final statement is:
SQLFiddle example: http://sqlfiddle.com/#!15/b1015/1