Keep the basic query you already have to cheaply identify rows that have any relevant array elements using an index.
Then unnest only the intersection (tag_ids & '{1,2,4,11}'
) in a LATERAL
join. Finally, aggregate:
SELECT tag_id, count(*) AS ct
FROM items i
, unnest(tag_ids & '{1,2,4,11}'::int[]) tag_id
WHERE tag_ids && '{1,2,4,11}'::int[]
GROUP BY tag_id
ORDER BY count(*) DESC;
Once again, the intersection operator &
of the intarray module is instrumental.
Previous related answer:
Without intarray
If you don't have intarray
installed, or for any other array type, we need another join instead:
SELECT tag_id, count(*) AS ct
FROM (
SELECT *
FROM items
WHERE tag_ids && '{1,2,4,11}'::int[]
) i, unnest(tag_ids) t(tag_id)
JOIN unnest('{1,2,4,11}'::int[]) x(tag_id) USING (tag_id)
GROUP BY tag_id
ORDER BY count(*) DESC;
Subtle difference: the intersection operator folds duplicates to produce distinct elements, while this query does not. Only matters if there can be duplicate array elements ...
SQL Fiddle demonstrating both and with an additional row that has a mix of matching and not matching elements to demonstrate the need of the intersection or additional join to eliminate unwanted elements.
Internal representation of larger attributes will be sometimes compressed. More specifically, what works here is the TOAST (Oversized Attribute Storage component used in PostgreSQL). The threshold when values are considered for compression is 2000 bytes.
pg_column_size()
is not a logical length, but the size (in bytes) of actual internal representation of the column/variable. It is documented.
PostgreSQL stores array values in a custom, internal, binary format. Command line example below. Details also here.
filip=# CREATE TABLE a(x text, a text[][]);
CREATE TABLE
filip=# insert into a select 'MARK', '{{ENE,DUE},{LIKE,FAKE}}';
INSERT 0 1
filip=# insert into a select 'MARK', '{{ENE,DUE},{LIKE,FAKE}}';
INSERT 0 1
filip=# checkpoint ;
CHECKPOINT
filip=# SELECT pg_relation_filepath('a');
pg_relation_filepath
----------------------
base/16385/16576
(1 row)
filip@szary:~$ sudo hexdump -C ~postgres/9.5/main/base/16385/16576
00000000 00 00 00 00 f0 99 b6 02 00 00 00 00 20 00 40 1f |............ .@.|
00000010 00 20 04 20 00 00 00 00 a0 9f b4 00 40 9f b4 00 |. . ........@...|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001f40 c8 07 08 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001f50 02 00 02 00 02 08 18 00 0b 4d 41 52 4b 7b 02 00 |.........MARK{..|
00001f60 00 00 00 00 00 00 19 00 00 00 02 00 00 00 02 00 |................|
00001f70 00 00 01 00 00 00 01 00 00 00 1c 00 00 00 45 4e |..............EN|
00001f80 45 00 1c 00 00 00 44 55 45 00 20 00 00 00 4c 49 |E.....DUE. ...LI|
00001f90 4b 45 20 00 00 00 46 41 4b 45 00 00 00 00 00 00 |KE ...FAKE......|
00001fa0 c7 07 08 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001fb0 01 00 02 00 02 08 18 00 0b 4d 41 52 4b 7b 02 00 |.........MARK{..|
00001fc0 00 00 00 00 00 00 19 00 00 00 02 00 00 00 02 00 |................|
00001fd0 00 00 01 00 00 00 01 00 00 00 1c 00 00 00 45 4e |..............EN|
00001fe0 45 00 1c 00 00 00 44 55 45 00 20 00 00 00 4c 49 |E.....DUE. ...LI|
00001ff0 4b 45 20 00 00 00 46 41 4b 45 00 00 00 00 00 00 |KE ...FAKE......|
00002000
answer written by @filiprem acting to extend basic info provided by @a-horse-with-no-name
Best Answer
You can use an array constructor: