Postgresql – Postgres : Multiply length of array to count aggregate result

aggregatepostgresql

Following command works fine :

select count(*) * 2 from rooms;

But if I am trying to do something like :

select count(*) * array_length(students,1) from rooms;

where students is an array column in the table rooms. Following error is thrown :

ERROR: column "rooms.students" must appear in the GROUP BY clause or be used in an aggregate function
SQL state: 42803
Character: 32

What is the right way to get the desired result?

Best Answer

sum was an obvious choice :

select sum(array_length(students,1)) from rooms;

Related Solutions

Postgresql – Postgres Nested WHEN Aggregate Function

After some processing this boiled down to:

While your predicate d."SettlementPointName" = 'John' is filtering a single value for "SettlementPointName" anyway, simplify to:

SELECT count(                                     d."SettlementPointPrice" < 10.5 OR NULL) AS da_00_10
     , count(d."SettlementPointPrice" >= 10.5 AND d."SettlementPointPrice" < 20.5 OR NULL) AS da_11_20
     , count(d."SettlementPointPrice" >= 20.5 AND d."SettlementPointPrice" < 30.5 OR NULL) AS da_21_30
FROM   public.da d
JOIN   public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE  d."SettlementPointName" = 'John'
AND    d."DeliveryDate" >= '2015-02-01'
AND    d."DeliveryDate" <= '2015-02-20'
AND    r."DeliveryHour" = 14
AND    date_part('hour', d."DeliveryHour") = r."DeliveryHour";

About the counting technique:

For absolute performance, is SUM faster or COUNT?

Or better, yet, use the new aggregate filter technique in pg 9.4:

SELECT d."SettlementPointName"
     , count(*) FILTER (WHERE d."SettlementPointPrice" <  10.5) AS da_00_10
     , count(*) FILTER (WHERE d."SettlementPointPrice" >= 10.5
                        AND   d."SettlementPointPrice" <  20.5) AS da_11_20
     , count(*) FILTER (WHERE d."SettlementPointPrice" >= 20.5
                        AND   d."SettlementPointPrice" <  30.5) AS da_21_30
FROM   public.da d
JOIN   public.rt_aggregate r USING ("DeliveryDate", "SettlementPointName")
WHERE  d."DeliveryDate" >= '2015-02-01'
AND    d."DeliveryDate" <= '2015-02-20'
AND    r."DeliveryHour" = 14
AND    date_part('hour', d."DeliveryHour") = r."DeliveryHour"
GROUP  BY 1;

This time, selecting all names and returning one row per name like you asked in the comment.

Details for FILTER:

Return counts for multiple ranges in a single SELECT statement

Postgresql – Aggregate rows of type array

Order of source rows irrelevant, source is array column

Your updated requirements are much simpler for two reasons:

We don't need to establish any order among source rows.
And the array already establishes order among array elements.

Simple table:

CREATE TABLE vals (j int[]);

Simple query:

SELECT ARRAY(SELECT elem FROM vals, unnest(j) elem) AS arr;

Order of source rows relevant, source are integer columns

Not as simple. Based on this table:

CREATE TABLE vals (i int, j int);

There are various ways to preserve order. I think this should be among the fastest - with an ARRAY constructor:

SELECT ARRAY(
   SELECT t.elem
   FROM  (SELECT i, j FROM vals ORDER BY i, j) v
        , LATERAL (VALUES (v.i), (v.j)) t(elem)
   ) AS arr;

My first draft had ORDER BY i, j after the LATERAL join, which would not guarantee the order of columns. I pulled the ORDER BY into a subselect to produce the correct order now. This works, but it's not documented. To be absolutely sure, you have to add an explicit ORDER BY later:

SELECT ARRAY (
   SELECT elem
   FROM   vals v, LATERAL (VALUES (1, v.i), (2, v.j)) t(ordr, elem)
   ORDER  BY v.i, v.j, ordr
   ) AS arr;

About guaranteed order:

PostgreSQL unnest() with element number

About the VALUES expression in a LATERAL join:

SELECT DISTINCT on multiple columns

Another way with - this time with array_agg():

SELECT array_agg(elem) AS arr
FROM  (
   SELECT i, j, 1 AS ordr, i AS elem FROM vals
   UNION ALL
   SELECT i, j, 2 AS ordr, j AS elem FROM vals
   ORDER BY i, j, ordr
   ) sub;

The ARRAY constructor is faster, array_agg() is easier to integrate with multiple result columns.

SQL Fiddle.

The query in your test case would actually work in Postgres 9.5 (currently beta, to be released in the course of 2015). Quoting the release notes for pg 9.5:

Allow array_agg() and ARRAY() to take arrays as inputs (Ali Akbar, Tom Lane)

However, you get a 2-dimenstional array instead, not your desired result. You can easily implement the same (but slower) aggregate function in older versions:

Is there something like a zip() function in PostgreSQL that combines two arrays?