Postgresql – Postgres – Querying Min, Max and getting associated columns data specific to those result of those rows

aggregatepostgresql-9.4window functions

I have query in which returns averaged values by site and month. What I want to know is the min and max of those monthly averages by site and (the difficult part) the month when each occurred.

Here is an example:

SQL Fiddle Stuff here

CREATE TABLE events (
    esite     integer NOT NULL,
    edate     timestamp with time zone NOT NULL,
    evalue    integer NOT NULL
);

INSERT INTO events Values
    (1, '2016-01-03', 11),
    (2, '2016-01-05', 90),
    (1, '2016-01-08', 7),
    (2, '2016-01-10', 40),
    (1, '2016-01-15', 12),
    (1, '2016-01-18', 66),
    (2, '2016-01-22', 54),
    (2, '2016-02-03', 70),
    (2, '2016-02-05', 56),
    (1, '2016-02-08', 61),
    (2, '2016-02-10', 23),
    (1, '2016-02-15', 30),
    (1, '2016-02-18', 15),
    (1, '2016-02-22', 41);

So I'm looking for a query that returns (by site) the min and max monthly average evalues and the months at which the min and max values occurred. I can get this by using the query below:

SQL Fiddle this

select esite, date_trunc('month', edate) as emonth, round(avg(evalue),2) as evalue_avg from events
  Group by esite, emonth

-- produces the following output:
esite | emonth                      | evalue_avg
2     | January, 01 2016 00:00:00   | 61.33
1     | January, 01 2016 00:00:00   | 24
1     | February, 01 2016 00:00:00  | 36.75
2     | February, 01 2016 00:00:00  | 49.67

Now for the part I'm having difficultly with, I need produce the following result: — basically the min, max values and the date(month) at which each occurred by site — (Desired Output)

-- (one row per site)
esite | avg_min | eval_avg_min_date          | avg_max | eval_avg_max
1     | 24.00   | January, 01 2016 00:00:00  | 36.75   | February, 01 2016 00:00:00
2     | 49.67   |February, 01 2016 00:00:00  | 61.33   | January, 01 2016 00:00:00

I've searched around and have seen some examples using windowing and lateral joins, but I haven't been successful in getting any of them to work. This might be a pivot but since I don't have a fixed number of sites, this tends to be difficult with PostgresSQL.

I'm guessing if anyone has an easy way to do this using postgres 9.4+.

Your help would be appreciated!

Best Answer

I would recommend you do something like this - and don't go down the route of having a row per month.

What happens when you have 5 (10...15... x) years' data?

This is superior IMHO. You can always use the CROSSTAB table function, but I would recommend against it.

SELECT
  esite,
  MIN(evalue), 
  MAX(evalue),
  AVG(evalue)::INT,
  EXTRACT(MONTH FROM edate) AS emonth,
  CASE
    WHEN EXTRACT(MONTH FROM edate) = 1 THEN 'January'
    WHEN EXTRACT(MONTH FROM edate) = 2 THEN 'February'
    -- <... fill in rest of months here ...>
  END AS litMonth
FROM events
GROUP BY esite, emonth, litMonth
ORDER BY esite, emonth -- , min, max, avg;

This gives the result:

esite;     min;   max;   avg;    emonth;    litmonth
----------------------------------------------------
    1;       7;    66;    24;     1;         January
    1;      15;    61;    37;     2;        February
    2;      40;    90;    61;     1;         January
    2;      23;    70;    50;     2;        February

Note, that I have CAST the AVG as an INTEGER - you can of course ROUND this off to whatever you like. You can add EXTRACT(YEAR FROM edate) AS eyear, after the EXTRACT(MONTH... for more clarity and a more elegant result.

Related Solutions

PostgreSQL query with max and min date plus associated id per row

You can calculate this in one step with window functions:

CREATE OR REPLACE VIEW daily_trans AS
SELECT DISTINCT
       trans_date
     , first_value(trans_time) OVER w AS first_time
     , first_value(id)         OVER w AS first_id
     , last_value(trans_time)  OVER w AS last_time
     , last_value(id)          OVER w AS last_id
     , calculate_status(min(trans_time) OVER w
                      , max(trans_time) OVER w) AS status 
FROM   trans
WINDOW w AS (PARTITION BY trans_date ORDER BY trans_time, id
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER  BY 1;

I added id as secondary ORDER column in the window clause, to make the sort order stable in case of identical times per day.

If you are not familiar with window functions, be sure to read this chapter of the manual.

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING is a not so commonly used clause to adjust the window frame, because we want last_value() to always return the last value of the whole partition (the date), not the last value up to the current (including peers) according to the sort order.

How you to combine DISTINCT with window functions:

PostgreSQL: running count of rows for a query 'by minute'

Sql-server – How to select rows which have max and min of count

This is an alternative formulation of Travis's answer which avoids the need to sort the COUNT in both directions.

WITH s
     AS (SELECT MyGroup,
                Count(MyGroup)              AS [Count],
                MAX(Count(MyGroup)) OVER () AS [MaxMyGroup],
                MIN(Count(MyGroup)) OVER () AS [MinMyGroup]
         FROM   MyTable
         GROUP  BY MyGroup)
SELECT Agg,
       MyGroup,
       V.[Count]
FROM   s
CROSS APPLY (VALUES ( 'Max', CASE WHEN [Count] = [MaxMyGroup] THEN [Count] END),
                     ('Min', CASE WHEN [Count] = [MinMyGroup] THEN [Count] END))
             V(Agg, [Count])
WHERE V.[Count] IS NOT NULL

Best Answer

Related Solutions

PostgreSQL query with max and min date plus associated id per row

Sql-server – How to select rows which have max and min of count

Related Question