MySQL Subquery Error – Handling Non-Existing Columns

subquery

I have two tables in a MySQL database – t1 with a column c1, and t2 with a column c2.

I run this query:

select * from t1 where c1 in (select c1 from t2);

The above query should give an error as c1 is not present in t2. Instead, it returns all the rows from t1. Another version of the above query with delete which can be much more disastrous:

delete from t1 where c1 in (select c1 from t2);

The above query deletes all the rows from t1 when it is just supposed to give an error.

I have noticed this behavior occurs only when the column in the subquery has the same name as the outer one. Meaning,

select * from t1 where c1 in (select c3 from t2);

will throw an error as expected:

 ERROR 1054 (42S22): Unknown column 'c3' in 'field list'

By the way, I have checked for the same issue on PostgreSQL 9.6.3 and the behavior is exactly the same. Any explanation for this strange behavior?

Best Answer

I run this query:

select * from t1 where c1 in (select c1 from t2);

The above query should give an error as c1 is not present in t2. Instead, it returns all the rows from t1.

No, the query should not give an error. It's a common mistake (thinking that the c1 in (select c1 from t2) refers to t2. It doesn't due to scope resolution, i.e. how column names are resolved (how it is found which table they are referring to). The query:

select * from t1 where c1 in (select c1 from t2);

Can resolve to three different options:

  • when t2 has a column named c1, it runs as:

    select * from t1 where c1 in (select t2.c1 from t2);
    
  • when t2 does not have a column named c1, but t1 has, it runs as:
    (this is your case!)

    select * from t1 where c1 in (select t1.c1 from t2);
    
  • and when neither t2 nor t1 have a column named c1, it will throw an error:

    select * from t1 where c1 in (select c1 from t2);
    

    -- Error is thrown ("Unknown column c1" or something like that)

Another version of the above query with delete which can be much more disastrous:

delete from t1 where c1 in (select c1 from t2);

The above query deletes all the rows from t1 when it is just supposed to give an error.

For the same reasons, no. The query resolves to and runs as:

delete from t1 where c1 in (select t1.c1 from t2);

so it will delete all rows from t1 as long as t2 table is not empty.


How to avoid these problems?

Always prefix column references with their table names. By doing this, you will always have the result you want or get an error if the column doesn't appear in the table you are prefixing it with.

Your queries should be:

select t1.* from t1 where t1.c1 in (select t2.c1 from t2);

delete from t1 where t1.c1 in (select t2.c1 from t2);

Both of them will throw an error if there is no column c1 in table t2.