Postgresql – Aggregate rows of type array

aggregatearraypostgresql

I need to aggregate rows with a single column of type INT[] into a single array of type INT[].

The following results in an error "could not find array type for data type integer[]":

create temp table vals (j int[])
on commit drop;
insert into vals
values ('{1,2}'), ('{1,5}'), ('{3,6}');
select array_agg(j) from vals;

The result that I need from the example is a single row that looks like {1,2,1,5,3,6}

The order of records is unimportant as long as the paired numbers remain adjacent to each other in the final array.

I'm out of ideas. How can I get the result that I need?

(PostgreSQL 9.4, Ubuntu 15.04)

Best Answer

Order of source rows irrelevant, source is array column

Your updated requirements are much simpler for two reasons:

  • We don't need to establish any order among source rows.
  • And the array already establishes order among array elements.

Simple table:

CREATE TABLE vals (j int[]);

Simple query:

SELECT ARRAY(SELECT elem FROM vals, unnest(j) elem) AS arr;

Order of source rows relevant, source are integer columns

Not as simple. Based on this table:

CREATE TABLE vals (i int, j int);

There are various ways to preserve order. I think this should be among the fastest - with an ARRAY constructor:

SELECT ARRAY(
   SELECT t.elem
   FROM  (SELECT i, j FROM vals ORDER BY i, j) v
        , LATERAL (VALUES (v.i), (v.j)) t(elem)
   ) AS arr;

My first draft had ORDER BY i, j after the LATERAL join, which would not guarantee the order of columns. I pulled the ORDER BY into a subselect to produce the correct order now. This works, but it's not documented. To be absolutely sure, you have to add an explicit ORDER BY later:

SELECT ARRAY (
   SELECT elem
   FROM   vals v, LATERAL (VALUES (1, v.i), (2, v.j)) t(ordr, elem)
   ORDER  BY v.i, v.j, ordr
   ) AS arr;

About guaranteed order:

About the VALUES expression in a LATERAL join:

Another way with - this time with array_agg():

SELECT array_agg(elem) AS arr
FROM  (
   SELECT i, j, 1 AS ordr, i AS elem FROM vals
   UNION ALL
   SELECT i, j, 2 AS ordr, j AS elem FROM vals
   ORDER BY i, j, ordr
   ) sub;

The ARRAY constructor is faster, array_agg() is easier to integrate with multiple result columns.

SQL Fiddle.


The query in your test case would actually work in Postgres 9.5 (currently beta, to be released in the course of 2015). Quoting the release notes for pg 9.5:

Allow array_agg() and ARRAY() to take arrays as inputs (Ali Akbar, Tom Lane)

However, you get a 2-dimenstional array instead, not your desired result. You can easily implement the same (but slower) aggregate function in older versions: