Order Results by Count of Common Array Elements in PostgreSQL

arraypostgresql

Using Postgres 9.4, I'm interested in having an array of integers like user_ids_who_like and provide an array of users (like user_ids_i_am_following) to sort that intersection.

Something like:

select * 
from items 
where [there is an intersection between 
       user_ids_who_like with user_ids_i_am_following] 
order by intersection(user_ids_who_like).count

Is grouping and ordering by an array intersection possible?

Example data:

items
name          | user_ids_who_like
'birds'       | '{1,3,5,8}'
'planes'      | '{2,3,4,11}'
'spaceships'  | '{3,4,6}'

For a given user_ids_who_i_follow = [3,4,11], can I do something like:

select * from items
where <user_ids_who_like intersects with user_ids_who_i_follow>
order by <count of that intersection>

Desired result:

name          | user_ids_who_like  | count
'planes'      |  '{2,3,4,11}'      | 3
'spaceships'  |  '{3,4,6}'         | 2
'birds'       |  '{1,3,5,8}'       | 1

One possibility seems to be something like this:

select id, user_ids_who_like, (user_ids_who_like & '{514, 515}'::int[]) as jt  
from queryables 
where user_ids_who_like && '{514, 515}' 
order by icount(user_ids_who_like & '{514, 515}'::int[]) desc;

But I can't tell if this style (using the intarray extension rather than native array functions and operators) is outdated; any feedback from more sophisticated users here? It's not clear to me how to do the intersection of two arrays using the methods and operators.

Best Answer

With tools of the basic Postgres installation only, you might unnest() and count in a LATERAL subquery:

SELECT i.name, i.user_ids_who_like, x.ct
FROM   items i
     , LATERAL (
   SELECT count(*) AS ct
   FROM   unnest(i.user_ids_who_like) uid
   WHERE  uid = ANY('{3,4,11}'::int[])
   ) x
ORDER  BY x.ct DESC;  -- add PK as tiebreaker for stable sort order

We don't need a LEFT JOIN to preserve rows without match because count() always returns a row - 0 for "no match".

intarray

Assuming integer arrays without NULL values or duplicates, the intersection operator & of the intarray module would be much simpler:

SELECT name, user_ids_who_like
     , array_length(user_ids_who_like & '{3,4,11}', 1) AS ct
FROM   items
ORDER  BY 3 DESC NULLS LAST;

I added NULLS LAST to sort empty arrays last - after the reminder from your later question:

Install intarray once per database for this.

Use the overlap opertaor && in the WHERE clause to rule out rows without any overlap:

SELECT ...
FROM   ...
WHERE user_ids_who_like && '{3,4,11}'
ORDER  BY ...

Why? Per documentation:

intarray provides index support for the &&, @>, <@, and @@ operators, as well as regular array equality.

Applies to standard array operators in a similar fashion. Details:


Alternatively and more radically, a normalized schema with a separate table instead of the array column user_ids_who_like would occupy more disk space, but offer simple solutions with plain btree indexes for these problems.