Postgresql – Calculating average with AVG for a list of numbers

aggregatepostgresqlunion

This is probably a newbie question (bear with me) but how can I calculate the average for a list of numbers using SQL and the AVG function? I mean, I can add together numbers in SQL
like:

SELECT 2+3+4

How can I calculate the average of 2, 3 and 4 without inserting them to a table first?

I have tried different kinds of variations like:

SELECT 2 AS foo UNION 3 AS foo UNION 4 AS foo
  foo
  integer
1 2
2 3
3 4

but:

SELECT AVG(SELECT 2 AS foo UNION 3 AS foo UNION 4 AS foo)

ERROR: syntax error at or near "select"

I just can't seem to find the right syntax. I'm on PostgreSQL at the moment if it matters.

Best Answer

The best solution depends on where and in what form the values come from.

`unnest()`

Use the set returning function unnest() with an array in the FROM clause. That's shortest for long lists, but it's a Postgres specific solution.

SELECT avg(n) FROM unnest('{2,3,4}'::int[]) n;

Which is short syntax for:

SELECT avg(n) FROM unnest('{2,3,4}'::int[]) AS n(n);

`VALUES`

Alternatively, use a VALUES expression as subquery. This is standard SQL.

SELECT avg(n)
FROM  (VALUES (2),(3),(4)) t(n);

`UNION ALL`

You can do the same with a much more verbose chain of SELECT ... UNION ALL SELECT .... Also standard SQL.

SELECT avg(n)
FROM  (
   SELECT 2 
   UNION ALL
   SELECT 3
   UNION ALL
   SELECT 4
   ) t(n);

Use UNION ALL, not just UNION, which would fold duplicates (which you did not specify!) and is therefore also considerably more expensive.

SQL Fiddle.

Note that all numeric literals in the example are cast to integer automatically (just digits). For other data types you may have to add explicit type casts. avg() returns (per documentation):

numeric for any integer-type argument, double precision for a floating-point argument, otherwise the same as the argument data type

Related Solutions

MySql: #1221 – Incorrect usage of UNION and ORDER BY

There are several ways to pass the IN / LIMIT limitation and the UNION / ORDER BY syntax requirements.

Wrap the Union in a derived table:

SELECT * FROM comment AS tbl WHERE sscm_id IN
  ( SELECT sscm_id
    FROM     
      ( ( SELECT sscm_id FROM comment AS t1 WHERE t1.sscm_oid = 4 
          ORDER BY t1.sscm_id DESC LIMIT 3)
      UNION  ALL
        ( SELECT sscm_id FROM comment AS t2 WHERE t2.sscm_oid = 3 
          ORDER BY t2.sscm_id DESC LIMIT 3)
      ) AS tmp 
  ) ;

Use JOIN instead of WHERE / IN:
UNION is required here instead of UNION ALL to avoid duplicates

SELECT tbl.* 
FROM comment AS tbl 
    JOIN 
        ( ( SELECT sscm_id FROM comment AS t1 WHERE t1.sscm_oid = 4 
            ORDER BY t1.sscm_id DESC LIMIT 3)
        UNION 
          ( SELECT sscm_id FROM comment AS t2 WHERE t2.sscm_oid = 3 
            ORDER BY t2.sscm_id DESC LIMIT 3)
        ) AS tmp 
      ON tmp.sscm_id = tbl.sscm_id ;

Postgresql – How to get the aggregate of a window function in Postgres

You can have window functions on the result of aggregate functions in a single query level.

This would all work nicely after a few modifications - except that it fails for the standard deviation on mathematical principal. The involved calculations are not linear, so you cannot simply combine standard deviations of sub-populations.

SELECT perm
      ,combo
      ,avg(value)                 AS perm_average_value
      ,sum(avg(value) * count(*)) OVER w_combo /
       sum(count(*)) OVER w_combo AS combo_average_value
      ,stddev_pop(value)          AS perm_stddev
      ,0                          AS combo_stddev  -- doesn't work!
      ,count(*)                   AS perm_count
      ,sum(count(*)) OVER w_combo AS combo_count
FROM   foo
GROUP  BY perm, combo
WINDOW w_combo  AS (PARTITION BY combo);

For combo_average_value you would need this expression

sum(avg(value) * count(*)) OVER w_combo / sum(count(*)) OVER w_combo

Since you need a weighted average. (The average of a group with 10 members weighs more than the average of a group with just 2 members!)

This works:

SELECT DISTINCT ON (perm, combo)
       perm
      ,combo
      ,avg(value)        OVER wpc AS perm_average_value
      ,avg(value)        OVER wc  AS combo_average_value
      ,stddev_pop(value) OVER wpc AS perm_stddev
      ,stddev_pop(value) OVER wc  AS combo_stddev
      ,count(*)          OVER wpc AS perm_count
      ,count(*)          OVER wc  AS combo_count
FROM   foo
WINDOW wc  AS (PARTITION BY combo)
      ,wpc AS (PARTITION BY perm, combo);

I am using two different windows here, and reduce the rows with DISTINCT which is applied even after window functions.

But I seriously doubt it will be faster than your original query. I am pretty sure it isn't.

Better performance with altered table layout

Arrays have an overhead of 24 bytes (slight variations depending on type). Also, you seem to have quite a few items per array and many repetitions. For a huge table like yours it would pay to normalize the schema. Example layout:

CREATE TABLE combo ( 
  combo_id serial PRIMARY KEY
 ,combo    int[] NOT NULL
);

CREATE TABLE perm ( 
  perm_id  serial PRIMARY KEY
 ,perm     int[] NOT NULL
);

CREATE TABLE value (
  perm_id  int REFERENCES perm(perm_id)
 ,combo_id int REFERENCES combo(combo_id)
 ,value numeric NOT NULL DEFAULT 0
);

If you don't need referential integrity you can omit the foreign key constraints.

The connection to combo_id could also be placed in the table perm, but in this scenario I would store it (slightly de-normalized) in value for better performance.

This would result in a row size of 32 bytes (tuple header + padding: 24 bytes, 2 x int (8 byte), no padding), plus the unknown size of your numeric column. (If you don't need extreme precision, a double precision or even a real column might do, too.)

More on physical storage in this related answer on SO or here:
Configuring PostgreSQL for read performance

Anyway, that's only a fraction of what you have now and would make your query a lot faster by size alone. Grouping and sorting on simple integers is also a lot faster.

You would first aggregate in a subquery and then join to perm and combo for best performance.

Best Answer

unnest()

VALUES

UNION ALL