PostgreSQL – Best Query to Exclude Existing Friends

optimizationperformancepostgresqlpostgresql-performance

I have a User table

/ id_user / username / ..

friends table (pair AB and BA):

/ friend_of / friend_to

and a friend_request table (used for pending friend requests)

/ id_req/ sent_from / sent_to

I would like to have a little search for new friend feature (autosuggestion style) and I would like NOT to show in the list friends that are ALREADY in my friends list OR being in a friend request pending.

The idea behind the query would be to show the username of user NOT in the friend_to column and NOT in the sent_to column.

I came up with these two flavors:

select id_user,username 
from public.user u 
where 
username like '%tom%' 
and 
u.id_user NOT IN (SELECT sent_to FROM friend_request where sent_from = 288) 
AND 
u.id_user NOT IN (SELECT friend_to FROM friends where friend_of=288);

select id_user,username
from public.user u 
left join friend_request fr 
on 
u.id_user=sent_to AND fr.sent_from=288
left join friends f 
on 
u.id_user = f.friend_to AND f.friend_of=288
where username like '%tom%' AND sent_to IS NULL and friend_to IS NULL;

Both of these produce the same result. But I would like some advice on which one is a better choice performance and bets practice wise?

Best Answer

NOT IN is typically the slowest option. LEFT JOIN / IS NULL is more promising. Or NOT EXISTS:

Select rows where value of second column is not present in first column

Query

Just your query, formatted:

SELECT u.id_user, u.username
FROM   "user" u 
LEFT   JOIN friend_request fr ON fr.sent_to = u.id_user
                             AND fr.sent_from = 288
LEFT   JOIN friends f ON f.friend_to = u.id_user
                     AND f.friend_of = 288
WHERE  u.username LIKE '%tom%'
AND    fr.sent_to IS NULL
AND    f.friend_to IS NULL;

Or, a variant with NOT EXISTS anti-semi-joins:

SELECT u.id_user, u.username
FROM   "user" u
WHERE  NOT EXISTS (
   SELECT 1 FROM friend_request
   WHERE  sent_to = u.id_user
   AND    sent_from = 288
   ) 
AND    NOT EXISTS (
   SELECT 1 FROM friends
   WHERE  friend_to = u.id_user
   AND    friend_of = 288
   )
WHERE  u.username LIKE '%tom%';

Indexes

The most important factor for performance. An ideal setup would have:

A trigram GIN or GiST index on "user".username. You need the additional module pg_trgm. Instructions - and Gin or GiST?
- How is LIKE implemented?
If index-only scans are possible, append the otherwise useless id_user. You need the additional module btree_gin (or btree_gist respectively):
- Inner join using an array column
Then:
```
CREATE INDEX usr_trgm_idx ON "user" USING gin (username gin_trgm_ops, id_user);
```
Multicolumn indexes (or a UNIQUE / PK constraint) on (sent_from, sent_to) and (friend_of, friend_to) - with columns in that order. The reversed order would work, too, but not quite as good. Each query checks for a single friend_of combined with many different friend_to, which is faster with the suggested column order, because this way tuples are much rather localized on the same or a few index pages.

That said, the proof of the pudding is still in the eating. Get the final verdict from EXPLAIN (BUFFERS, ANALYZE) in your production database.

Aside

Do not call your table user, that's a reserved word.

Related Solutions

Mysql – Slow performance of MySQL Join Query

Please provide SHOW CREATE TABLE; the explain is useless without it.

OR is a performance killer in many contexts.

( p.pricelist = "name_abc" AND p.iln = "sellerID_123" ) OR ( p.pricelist = "name_def" AND p.iln = "sellerID_456" ) OR ...

Turn that into

JOIN ( SELECT id FROM p WHERE 
( p.pricelist = "name_abc" AND p.iln = "sellerID_123" ) OR 
( p.pricelist = "name_def" AND p.iln = "sellerID_456" ) OR ... ) x ON x.id = foo.id

Also needed (on p):

INDEX(pricelist, iln, id)

(With the CREATEs, I could be more specific.)

The idea behind this "trick" is to move the costly work of the OR into a subquery that returns the necessary ids. Plus the INDEX makes it so that it can do all that work in the INDEX.

PostgreSQL – Optimizing a Query with a Large IN Clause

Instead of using a huge IN-list, join on a VALUES expression, or if the list is large enough, use a temp table, index it, then join on it.

It'd be nice if PostgreSQL could do this internally & automatically but at this point the planner doesn't know how.