Using Postgres 9.4, I'm interested in having an array of integers like user_ids_who_like
and provide an array of users (like user_ids_i_am_following
) to sort that intersection.
Something like:
select *
from items
where [there is an intersection between
user_ids_who_like with user_ids_i_am_following]
order by intersection(user_ids_who_like).count
Is grouping and ordering by an array intersection possible?
Example data:
items
name | user_ids_who_like
'birds' | '{1,3,5,8}'
'planes' | '{2,3,4,11}'
'spaceships' | '{3,4,6}'
For a given user_ids_who_i_follow = [3,4,11]
, can I do something like:
select * from items
where <user_ids_who_like intersects with user_ids_who_i_follow>
order by <count of that intersection>
Desired result:
name | user_ids_who_like | count
'planes' | '{2,3,4,11}' | 3
'spaceships' | '{3,4,6}' | 2
'birds' | '{1,3,5,8}' | 1
One possibility seems to be something like this:
select id, user_ids_who_like, (user_ids_who_like & '{514, 515}'::int[]) as jt
from queryables
where user_ids_who_like && '{514, 515}'
order by icount(user_ids_who_like & '{514, 515}'::int[]) desc;
But I can't tell if this style (using the intarray extension rather than native array functions and operators) is outdated; any feedback from more sophisticated users here? It's not clear to me how to do the intersection of two arrays using the methods and operators.
Best Answer
With tools of the basic Postgres installation only, you might
unnest()
and count in aLATERAL
subquery:We don't need a
LEFT JOIN
to preserve rows without match becausecount()
always returns a row - 0 for "no match".intarray
Assuming
integer
arrays without NULL values or duplicates, the intersection operator&
of theintarray
module would be much simpler:I added
NULLS LAST
to sort empty arrays last - after the reminder from your later question:Install
intarray
once per database for this.Use the overlap opertaor
&&
in theWHERE
clause to rule out rows without any overlap:Why? Per documentation:
Applies to standard array operators in a similar fashion. Details:
Alternatively and more radically, a normalized schema with a separate table instead of the array column
user_ids_who_like
would occupy more disk space, but offer simple solutions with plain btree indexes for these problems.