I have a simple table in Postgresql:
CREATE TABLE data (id integer, values integer[]);
INSERT INTO data VALUES (1, '{1,2,3,4,5}');
INSERT INTO data VALUES (2, '{1,1,2,3,4,5}');
INSERT INTO data VALUES (3, '{1,1,2,1,3,4,5}');
My basic query for values is something like:
SELECT id FROM data WHERE values >@ ARRAY[1,2];
I am trying to select rows with multiple copies of the same value, e.g.
SELECT id FROM data WHERE values >@ ARRAY[1,1,3];
Since values are compared one by one, the query above will match all 3 rows, while I would like to match only IDs 2 and 3, so ones where there are at least two copies of 1 in the 'values' array. Similarly
SELECT id FROM data WHERE values >@ ARRAY[1,1,1,2];
would match only ID 3.
Any pointers on how to proceed, or which functions to look into?
Thanks.
Best Answer
Note: this will only work if you make data.id a PRIMARY KEY
Here's an SQL Fiddle.
This works by converting your array into a recordset/table (which I've called "no_duplicates") using
unnest()
, and removing duplicates usingDISTINCT
:Then I GROUP BY the original data table's ID, and compare the length of the new, filtered recordset with the old, unfiltered table. If the original, unfiltered array is bigger, then we removed duplicates so we should select that row: