Postgresql – Checking for multiple identical values in a Posgresql array

arraypostgresql

I have a simple table in Postgresql:

CREATE TABLE data (id integer, values integer[]);
INSERT INTO data VALUES (1, '{1,2,3,4,5}');
INSERT INTO data VALUES (2, '{1,1,2,3,4,5}');
INSERT INTO data VALUES (3, '{1,1,2,1,3,4,5}');

My basic query for values is something like:

SELECT id FROM data WHERE values >@ ARRAY[1,2];

I am trying to select rows with multiple copies of the same value, e.g.

SELECT id FROM data WHERE values >@ ARRAY[1,1,3];

Since values are compared one by one, the query above will match all 3 rows, while I would like to match only IDs 2 and 3, so ones where there are at least two copies of 1 in the 'values' array. Similarly

SELECT id FROM data WHERE values >@ ARRAY[1,1,1,2];

would match only ID 3.

Any pointers on how to proceed, or which functions to look into?

Thanks.

Best Answer

Note: this will only work if you make data.id a PRIMARY KEY

SELECT
    data.id
FROM 
    data, 
    LATERAL (SELECT DISTINCT unnest(values)) no_duplicates
GROUP BY 
    data.id 
HAVING
    array_length(values, 1) > COUNT(no_duplicates)

Here's an SQL Fiddle.

This works by converting your array into a recordset/table (which I've called "no_duplicates") using unnest(), and removing duplicates using DISTINCT:

LATERAL (SELECT DISTINCT unnest(values)) no_duplicates

Then I GROUP BY the original data table's ID, and compare the length of the new, filtered recordset with the old, unfiltered table. If the original, unfiltered array is bigger, then we removed duplicates so we should select that row:

array_length(values, 1) > COUNT(no_duplicates)