Postgresql – better join, in or in with join

join;postgresql

I have these tables:

calls: rows [ 'created_at', 'user_id']

users: rows ['id', 'username']

And I need to get all user without calls from date:
How can I get it? Or what way is better and why?

I know three ways:

first:

SELECT id 
FROM users
WHERE id NOT IN (SELECT user_id FROM calls where calls.created_at >= date)

second(I think it is not right, but I am not sure):

SELECT id
FROM users
LEFT OUTER JOIN calls
   ON calls.user_id = users.id
WHERE calls.created_at >= date AND calls.user_id ISNULL

last:

SELECT id
FROM users
WHERE id NOT IN (
   SELECT user_id
   FROM calls
   INNER JOIN users AS call_users
      ON call_users.id = users.id
   WHERE calls.created_at >= date

DB: PostgreSQL

Best Answer

SQL is specifically meant to express simple queries more or less like you would ask them as a question in English. For instance, your question is like:

"What are all the id from users that have no associated call in calls with a created_date greater or equal than date $1 ?"

SQL translation:

SELECT id FROM users
  WHERE NOT EXISTS (
    SELECT 1 FROM calls WHERE call.user_id=users.id AND created_date >= $1
  );

It's a rather direct translation except maybe for SELECTing 1, which is an arbitrary value.

There is no reason to assume that PostgreSQL would produce a non-optimal execution plan for this query. The optimizer is certainly smart enough for that kind of queries. In general if there's a straightforward way to express your query, there's no reason to pretend to be smarter than the optimizer.

When you doubt that the optimizer found the best possible plan (say for more complex queries), you can start from the output of EXPLAIN to figure out if/why it's not optimal and typically try to improve the query by trial and error.

If you want to compare how different variants of the same query are planned, compare the outputs of EXPLAIN ANALYZE of the queries.

https://wiki.postgresql.org/wiki/Slow_Query_Questions is a good start too.