Sql-server – EXISTS (SELECT 1 …) vs EXISTS (SELECT * …) One or the other

MySQLoraclepostgresqlsql server

Whenever I need to check for the existence of some row in a table, I tend to write always a condition like:

SELECT a, b, c
  FROM a_table
 WHERE EXISTS
       (SELECT *  -- This is what I normally write
          FROM another_table
         WHERE another_table.b = a_table.b
       )

Some other people write it like:

SELECT a, b, c
  FROM a_table
 WHERE EXISTS
       (SELECT 1   --- This nice '1' is what I have seen other people use
          FROM another_table
         WHERE another_table.b = a_table.b
       )

When the condition is NOT EXISTS instead of EXISTS: In some occasions, I might write it with a LEFT JOIN and an extra condition (sometimes called an antijoin):

SELECT a, b, c
  FROM a_table
       LEFT JOIN another_table ON another_table.b = a_table.b
 WHERE another_table.primary_key IS NULL

I try to avoid it because I think the meaning is less clear, specially when what is your primary_key is not that obvious, or when your primary key or your join condition is multi-column (and you can easily forget one of the columns). However, sometimes you maintain code written by somebody else… and it is just there.

  1. Is there any difference (other than style) to use SELECT 1 instead of SELECT *?
    Is there any corner case where it does not behave the same way?

  2. Although what I wrote is (AFAIK) standard SQL: Is there such a difference for different databases / older versions?

  3. Is there any advantage on explicity writing an antijoin?
    Do contemporary planners/optimizers treat it differently from the NOT EXISTS clause?

Best Answer

No, there is no difference in efficiency between (NOT) EXISTS (SELECT 1 ...) and (NOT) EXISTS (SELECT * ...) in all major DBMS. I've often seen (NOT) EXISTS (SELECT NULL ...) being used as well.

In some you can even write (NOT) EXISTS (SELECT 1/0 ...) and the result is the same - without any (division by zero) error, which proves that the expression there is not even evaluated.


About the LEFT JOIN / IS NULL antijoin method, a correction: this is equivalent to NOT EXISTS (SELECT ...).

In this case, NOT EXISTS vs LEFT JOIN / IS NULL, you may get different execution plans. In MySQL for example and mostly in older versions (before 5.7) the plans would be fairly similar but not identical. The optimizers of other DBMS (SQL Server, Oracle, Postgres, DB2) are - as far as I know - more or less capable of rewriting these 2 methods and considering the same plans for both. Still, there is no such guarantee and when doing optimization, it is good to check the plans from different equivalent rewrites as there could be cases that each optimizer doesn't rewrite (eg. complex queries, with many joins and/or derived tables / subqueries inside the subquery, where conditions from multiple tables, composite columns used in the joining conditions) or the optimizer choices and plans are affected differently by the available indexes, settings, etc.

Also note that USING cannot be used in all DBMS (SQL Server for example). The more common JOIN ... ON works everywhere.
And the columns needs to be prefixed with the table name/alias in the SELECT to avoid errors/ambiguities when we have joins.
I also usually prefer to put the joined column in the IS NULL check (although the PK or any non-nullable column would be OK, it might be useful for efficiency when the plan for LEFT JOIN uses a non-clustered index):

SELECT a_table.a, a_table.b, a_table.c
  FROM a_table
       LEFT JOIN another_table 
           ON another_table.b = a_table.b
 WHERE another_table.b IS NULL ;

There is also a third method for antijoins, using NOT IN but this has different semantics (and results!) if the column of the inside table is nullable. It can be used though by excluding the rows with NULL, making the query equivalent to the previous 2 versions:

SELECT a, b, c
  FROM a_table
 WHERE a_table.b NOT IN 
       (SELECT another_table.b
          FROM another_table
         WHERE another_table.b IS NOT NULL
       ) ;

This also usually yields similar plans in most DBMS.