Postgresql – Select query for returning different data based on creation time

aggregatepivotpostgresqlsubquery

I have a view, which will select data from table reports. The select will filter the data based on multiple dates.

reports table: id, key_id, position, created_at

My select:

pr.key_id,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '1' day) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS yesterday,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '2' day) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS last_week,
COALESCE(CASE WHEN date(pr.created_at) = date(current_date - interval '3' month) THEN array_agg(DISTINCT pr.position) ELSE '{0}' END ) AS last_month,

Even if I group by key_id and date the result is not grouped.

Example of result I get now:

  1. key_id: 1 yesterday: {1} last_week: {0} last_month: {0} created_at: '2014-04-11'
  2. key_id: 1 yesterday: {0} last_week: {1} last_month: {0} created_at: '2014-04-10'
  3. key_id: 1 yesterday: {0} last_week: {0} last_month: {1} created_at: '2014-04-09'

What I expect:

  1. key_id: 1 yesterday: {1} last_week: {1} last_month: {1} created_at: '2014-04-11'

With this query I get what I want. Is there a better way of doing it?

SELECT
    k.name,
    array_agg(DISTINCT position),
    (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '1' day)) as y_position,
    (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '7' day)) as last_week,
    (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '1' month)) as last_month,
    (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '3' month)) as three_month,
    (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '6' month)) as six_month,
    (SELECT array_agg(DISTINCT position) FROM project_reports prk WHERE prk.project_id = pr.project_id AND prk.keyword_id = pr.keyword_id AND date(prk.created_at) = date(current_date - interval '12' month)) as last_year,
    date(pr.created_at)
FROM project_keywords pk
INNER JOIN keywords k ON pk.keyword_id = k.id
INNER JOIN project_reports pr USING (project_id, keyword_id)
GROUP BY k.name, date(pr.created_at), pr.project_id, pr.keyword_id

Best Answer

If I guessed the missing pieces correctly, this query using a crosstab() function should do the job (and much faster than the original - except for small cardinalities where not much can be gained to begin with):

It requires that you install the additional module tablefunc first (once per database. Detailed instructions in this related answer on SO:
PostgreSQL Crosstab Query

SELECT name, project_id, positions, created_day
      ,yesterday, last_week, last_month, three_month, six_month, last_year
FROM  (
   SELECT k.name, pr.keyword_id, pr.project_id, pr.created_at::date AS created_day
        , array_agg(DISTINCT pr.position) AS positions
   FROM   project_reports  pr
   JOIN   keywords         k  ON k.id = pr.keyword_id
   GROUP  BY k.id, pr.project_id, created_day -- k.id being the pk
   ) sub
LEFT JOIN crosstab(
    $$SELECT ARRAY[keyword_id, project_id], created_at::date, array_agg(DISTINCT position)
      FROM   project_reports
      WHERE  created_at::date IN (
                    current_date - 1
                  , current_date - 7
                  , (now() - interval '1 mon')::date
                  , (now() - interval '3 mon')::date
                  , (now() - interval '6 mon')::date
                  , (now() - interval '12 mon')::date
                  )
      GROUP  BY 1,2
      ORDER  BY 1,2$$
   ,$$VALUES 
       (current_date - 1)
      ,(current_date - 7)
      ,((now() - interval '1 mon')::date)
      ,((now() - interval '3 mon')::date)
      ,((now() - interval '6 mon')::date)
      ,((now() - interval '12 mon')::date)$$
   ) AS t (
        kp int[]
      , yesterday   int[]
      , last_week   int[]
      , last_month  int[]
      , three_month int[]
      , six_month   int[]
      , last_year   int[]
      ) ON t.kp[1] = keyword_id
       AND t.kp[2] = project_id;

To break it down, a simple demo of what I am doing in the crosstab:

SELECT * FROM crosstab(
 $$SELECT *
   FROM  (
      VALUES
        ('{1,2}'::int[], current_date - 1, '{1,2}'::int[])
      , ('{1,3}'::int[], current_date - 1, '{1,2}'::int[])
      , ('{1,2}', current_date - 7, '{5,6,7}'::int[])
      , ('{1,3}', current_date - 7, '{5,6,7}'::int[])
      , ('{1,2}', (now() - interval '1 mon')::date, '{9,10}'::int[])
       -- 3 mon missing
      , ('{1,2}', (now() - interval '6 mon')::date, '{6,6,6}'::int[])
       -- 12 mon missing
      ) sub
    ORDER BY 1,2$$
   
,$$VALUES 
    (current_date - 1)
   ,(current_date - 7)
   ,((now() - interval '1 mon')::date)
   ,((now() - interval '3 mon')::date)
   ,((now() - interval '6 mon')::date)
   ,((now() - interval '12 mon')::date)$$
) AS t (
     kp int[]
   , yesterday   int[]
   , last_week   int[]
   , last_month  int[]
   , three_month int[]
   , six_month   int[]
   , last_year   int[]
   );

Result:

  kp   | yesterday | last_week | last_month | three_month | six_month | last_year
-------+-----------+-----------+------------+-------------+-----------+-----------
 {1,2} | {1,2}     | {5,6,7}   | {9,10}     |             | {6,6,6}   |
 {1,3} | {1,2}     | {5,6,7}   |            |             |           |

The rest should be obvious.

Comments

  • I de-coupled the two parts and fused the plethora of correlated subqueries into a single subquery, which should be much faster. Then I use cross tabulation pull individual columns from this query. The syntax for crosstab() is the tricky part.

  • I am not joining to project_keywords, it seems to be just noise. I am suspecting problems in your relational model.

  • I seems you should include the project_id in the result to make sense. Else you should not GROUP BY it ...

  • You can just subtract integer from date.