PostgreSQL – Select Data Based on Records Created_at Column

aggregateperformancepostgresqlpostgresql-9.3query-performance

I have two tables Keywords, ProjectReports:

http://sqlfiddle.com/#!15/06ae3/1

As you can see in that example everything works fine, but I want to get more data from those 2 tables and I don't know how.

  1. In the position column I would like to display only the smallest value and just one value instead of that array. ex: [1, 2, 3] will be 1
  2. As you can see there is the keyword with id 4 which doesn't have any data in the project_reports table, in this case I would like to add the value 0 in the position column and add the current day date in the CREATED_AT column.
  3. Also I would like to display the position value for the same keywords, where created_at column from project_reports table had the value now() - interval '1 day'; now() - interval '1 week'; now() - interval '1 mon', if there is no position for those days add 0.

Expected output:

Keywords:

 ID: 1 name: test
 ID: 2 name: test1
 ID: 3 name: test2

Project_reports:

 ID: 1, keyword_id: 1, project_id: 1, position: 1, created_at: '2014-07-09'
 ID: 2, keyword_id: 1, project_id: 1, position: 2, created_at: '2014-07-09'
 ID: 3, keyword_id: 1, project_id: 1, position: 3, created_at: '2014-07-10'
 ID: 5, keyword_id: 1, project_id: 1, position: 4, created_at: '2014-07-10'
 ID: 6, keyword_id: 2, project_id: 1, position: 1, created_at: '2014-07-09'

After query

k_name: test, k_id: 1, p_id: 1, position 3, yesterday_position: 1, created_at: '2014-07-10'
k_name: test1, k_id: 2, p_id: 1, position 0, yesterday_position: 1, created_at: '2014-07-10'
k_name: test2, k_id: 3, p_id: 1, position 0, yesterday_position: 0, created_at: '2014-07-10'
k_name: test, k_id: 1, p_id: 1, position 1, yesterday_position: 0, created_at: '2014-07-09'
k_name: test1, k_id: 2, p_id: 1, position 1, yesterday_position: 0, created_at: '2014-07-09'
k_name: test2, k_id: 3, p_id: 1, position 0, yesterday_position: 0, created_at: '2014-07-09'

Best Answer

Note the slightly modified schema of my test in the fiddle.
Using actual primary keys and proper column names instead of id.

Also, you seem to be operating with dates exclusively. So I suggest to convert your timestamp columns to date.

Items 1 and 2

SELECT k.keyword_id
     , k.name
     , pr.project_id
     , COALESCE(min(pr.position), 0) AS pos
     , COALESCE(pr.created_at, now()::date) AS created_at
FROM   keyword             k 
LEFT   JOIN project_report pr USING (keyword_id)
GROUP  BY k.keyword_id, pr.project_id, pr.created_at
ORDER  BY keyword_id, created_at
;
  • In Postgres 9.1 or later the pk column covers the whole table in GROUP BY.
  • Use COALESCE to replace possible NULL values.

A guess at item 3

WITH cte AS (
   SELECT k.keyword_id
        , k.name
        , pr.project_id
        , COALESCE(min(pr.position), 0) AS pos
        , COALESCE(pr.created_at, now()::date) AS created_at
   FROM   keyword   k 
   LEFT   JOIN project_report pr USING (keyword_id)
   GROUP  BY k.keyword_id, pr.project_id, pr.created_at
   )
, x AS (
   SELECT DISTINCT ON (keyword_id, project_id) *
   FROM   cte
   ORDER  BY keyword_id, project_id, created_at DESC
   )
SELECT x.*
     , COALESCE(y.pos, 0) AS yesterday_pos 
     , COALESCE(w.pos, 0) AS week_pos 
     , COALESCE(m.pos, 0) AS month_pos 
FROM   x
LEFT   JOIN  cte y ON y.keyword_id = x.keyword_id
                  AND y.project_id = x.project_id
                  AND y.created_at = x.created_at - interval '1 day'
LEFT   JOIN  cte w ON w.keyword_id = x.keyword_id
                  AND w.project_id = x.project_id
                  AND w.created_at = x.created_at - interval '1 week'
LEFT   JOIN  cte m ON m.keyword_id = x.keyword_id
                  AND m.project_id = x.project_id
                  AND m.created_at = x.created_at - interval '1 month'
;

Explain

  1. In CTE cte produce daily aggregates per (k.keyword_id, pr.project_id).
  2. In CTE x pick the latest day per (k.keyword_id, pr.project_id).
  3. In the outer query LEFT JOIN the latest day x to cte multiple times to retrieve past values from the same (k.keyword_id, pr.project_id) for 1 day / week / month earlier.

SQL Fiddle.