Postgresql – Combine multiple queries into 1 query

arrayperformancepostgresqlsubquery

I have a message table in Postgres 9.4 that contains a words field of array type, with random words of a message. Currently I have millions of messages:

\d messages
                        Table "public.messages"
            Column            |            Type             | Modifiers
------------------------------+-----------------------------+-----------
 id_str                       | character varying(255)      | not null
 feed_id                      | integer                     |
 message                      | character varying(255)      |
 posted_at                    | timestamp without time zone |
 words                        | character varying(255)[]    |
 Indexes:
    "messages_pkey" PRIMARY KEY, btree (id_str)
    "index_messages_on_feed_id" btree (feed_id)
    "index_messages_on_feed_id_posted_at" btree (feed_id, posted_at DESC NULLS LAST)
    "index_messages_on_words" gin (words)

Now I have an array of words [A, B, C].
I need to find the number of messages that the words contains each of the combinations of the list, i.e. [[A,B], [A,C], [B,C]].

Also I have this combination array created outside of the Postgres query already.

I can do this by creating 3 queries:

select count(*) from messages where words @> ARRAY['A','B']::varchar[];
select count(*) from messages where words @> ARRAY['A','C']::varchar[];
select count(*) from messages where words @> ARRAY['B','C']::varchar[];

Is there a way to do this in one query? And output the result in the following format?

A, B, count or [A, B], count
A, C, count
B, C, count

Edit:

Here's the query I run currently:

http://sqlfiddle.com/#!15/7c907/2/0

http://explain.depesz.com/s/Dot

And this one is using Erwin Brandstetter 's method:

http://sqlfiddle.com/#!15/7c907/1/0

http://explain.depesz.com/s/pr2

Now assume the messages table has more than 1 Million records, the CTE scan or seq scan in each of the query plan will become very slow.

Best Answer

To get all unique pairs of elements from an array of arbitrary length:

WITH a(a) AS (SELECT '{A,B,C,D}'::text[])  -- provide array here
,    i(i) AS (SELECT i FROM a, generate_series(1, array_upper(a.a,1)) i)
SELECT ARRAY[a[i1.i], a[i2.i]] AS pair
FROM   i i1
JOIN   i i2 ON i2 > i1
,      a;

You can then join to the message table.

Without knowing any details of your setup, my educated guess is that a LATERAL join will be fastest as it can use the GIN index on messages.words - create it if you don't have one yet.

WITH a(a) AS (SELECT '{A,B,C,D}'::text[])
,    i(i) AS (SELECT i FROM a, generate_series(1, array_upper(a.a,1)) i)
SELECT p.pair, c.ct
FROM  (
   SELECT ARRAY[a[i1.i], a[i2.i]] AS pair
        , i1.i AS i1, i2.i AS i2
   FROM   i i1
   JOIN   i i2 ON i2 > i1
   ,      a
   ) p
, LATERAL (
   SELECT count(*) AS ct
   FROM   message
   WHERE  words @> p.pair
   ) c
ORDER BY p.i1, p.i2;

SQL Fiddle.