Postgresql – How to get the adjacent value of MAX/MIN aggregate function

postgresqlquery

I have table applications and assets. Assets belong to applications. I want to get the highest last modified date and also the adjacent value (of who that value belongs to).

Application table

| application_id | name    |
| -------------- | --------|
| 101            | Weather |
| 102            | HR      |

Asset table

| asset_id | last_modified | last_modified_by | application_id |
| -------- | ------------- | -----------------| -------------- |
| 1        | 2013-01-20    | Toby             | 101            |
| 2        | 2014-03-25    | John             | 102            |
| 3        | 2015-05-30    | Mike             | 101            |
| 4        | 2016-06-25    | Toby             | 101            |
| 5        | 2017-07-26    | Mike             | 101            |
| 6        | 2018-08-27    | Lucky            | 102            |

I want

| application_id | name    | last_modified | last_modified_by |
| -------------- | --------| --------------| -----------------|
| 101            | Weather | 2017-07-26    | Mike             |
| 102            | HR      | 2018-08-27    | Lucky            |

Getting the last_modified is easy but I want to get the adjacent last_modified_by field as well.

Best Answer

This it a variation of a greatest-n-per-group problem.

I would do this with a join to a derived table:

select *
from application app
  join (
    select distinct on (application_id) *
    from asset 
    order by application_id, last_modified desc
  ) ast on ast.application_id = app.id;

Related Solutions

How to Get Aggregate of a Window Function in PostgreSQL

You can have window functions on the result of aggregate functions in a single query level.

This would all work nicely after a few modifications - except that it fails for the standard deviation on mathematical principal. The involved calculations are not linear, so you cannot simply combine standard deviations of sub-populations.

SELECT perm
      ,combo
      ,avg(value)                 AS perm_average_value
      ,sum(avg(value) * count(*)) OVER w_combo /
       sum(count(*)) OVER w_combo AS combo_average_value
      ,stddev_pop(value)          AS perm_stddev
      ,0                          AS combo_stddev  -- doesn't work!
      ,count(*)                   AS perm_count
      ,sum(count(*)) OVER w_combo AS combo_count
FROM   foo
GROUP  BY perm, combo
WINDOW w_combo  AS (PARTITION BY combo);

For combo_average_value you would need this expression

sum(avg(value) * count(*)) OVER w_combo / sum(count(*)) OVER w_combo

Since you need a weighted average. (The average of a group with 10 members weighs more than the average of a group with just 2 members!)

This works:

SELECT DISTINCT ON (perm, combo)
       perm
      ,combo
      ,avg(value)        OVER wpc AS perm_average_value
      ,avg(value)        OVER wc  AS combo_average_value
      ,stddev_pop(value) OVER wpc AS perm_stddev
      ,stddev_pop(value) OVER wc  AS combo_stddev
      ,count(*)          OVER wpc AS perm_count
      ,count(*)          OVER wc  AS combo_count
FROM   foo
WINDOW wc  AS (PARTITION BY combo)
      ,wpc AS (PARTITION BY perm, combo);

I am using two different windows here, and reduce the rows with DISTINCT which is applied even after window functions.

But I seriously doubt it will be faster than your original query. I am pretty sure it isn't.

Better performance with altered table layout

Arrays have an overhead of 24 bytes (slight variations depending on type). Also, you seem to have quite a few items per array and many repetitions. For a huge table like yours it would pay to normalize the schema. Example layout:

CREATE TABLE combo ( 
  combo_id serial PRIMARY KEY
 ,combo    int[] NOT NULL
);

CREATE TABLE perm ( 
  perm_id  serial PRIMARY KEY
 ,perm     int[] NOT NULL
);

CREATE TABLE value (
  perm_id  int REFERENCES perm(perm_id)
 ,combo_id int REFERENCES combo(combo_id)
 ,value numeric NOT NULL DEFAULT 0
);

If you don't need referential integrity you can omit the foreign key constraints.

The connection to combo_id could also be placed in the table perm, but in this scenario I would store it (slightly de-normalized) in value for better performance.

This would result in a row size of 32 bytes (tuple header + padding: 24 bytes, 2 x int (8 byte), no padding), plus the unknown size of your numeric column. (If you don't need extreme precision, a double precision or even a real column might do, too.)

More on physical storage in this related answer on SO or here:
Configuring PostgreSQL for read performance

Anyway, that's only a fraction of what you have now and would make your query a lot faster by size alone. Grouping and sorting on simple integers is also a lot faster.

You would first aggregate in a subquery and then join to perm and combo for best performance.

PostgreSQL – Aggregate Query with MIN Function

This can be much simpler, yet, with DISTINCT ON:

SELECT DISTINCT ON (product_id)
       product_id
     , CASE WHEN stock = 0 THEN NULL ELSE warehouse_id END AS warehouse_id
     , stock
     , CASE WHEN stock = 0 THEN NULL ELSE price END AS price
FROM   product_stock
ORDER  BY product_id, (stock = 0), price;

Assuming stock to be NOT NULL.

SQL Fiddle.

About DISTINCT ON:

Select first row in each GROUP BY group?

Postgres has a proper boolean type and one can ORDER BY any boolean expression. FALSE sorts before TRUE sorts before NULL. So rows with (stock = 0) sort behind rows with any other value for stock - except NULL, which would sort last.

Best Answer

Related Solutions

How to Get Aggregate of a Window Function in PostgreSQL

Better performance with altered table layout

PostgreSQL – Aggregate Query with MIN Function

Related Question