Postgresql – Subquery magic in PostgreSQL

postgresqlpostgresql-9.4subquery

I have a query:

update product_product 
set (write_date, default_code) = (LOCALTIMESTAMP, 'update') 
where product_tmpl_id in (
    select distinct product_tmpl_id 
    from product_template 
    where type='import');

And it takes 7 hours to complete. However, when I execute the subquery:

select distinct product_tmpl_id 
from product_template 
where type='import';

I get the error:

column "product_tmpl_id" does not exist

There isn't a column product_tmpl_id in the table product_template.

It was my mistake. The first query should have been:

update product_product set (write_date,default_code) = (LOCALTIMESTAMP,'update') 
where product_tmpl_id in
 (select distinct id from product_template where type='import');

And it onle takes several seconds to run.

So my questions are the following:

  • why didn`t the first query fail?
  • why took it so long to run?
  • what exactly did it do?

The result of select version(); is

PostgreSQL 9.4.3 on x86_64-unknown-linux-gnu, 
compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

Best Answer

why didn`t the first query fail?

Because this is valid SQL.

First of all consider that it is valid to run a SELECT against the table that doesn't reference any columns from that table.

SELECT DISTINCT 'foo'
FROM   product_template
WHERE  type = 'import'; 

The above will return a single row result of foo if any rows exist that match the WHERE clause.

Secondly consider that it is valid to reference a column from the outer table in a sub query (this is required for correlated sub queries to work).

In the event that the column name is not qualified with a table reference then it will be resolved in the inner scope if possible and outer scopes if not. In your case product_template has no such column so it was resolved as belonging to product_product.

As a best practice be explicit about the tables the columns belong to. Had the subquery been written as below it would have failed to compile and alerted you of the error.

IN (SELECT DISTINCT pt.product_tmpl_id
    FROM   product_template pt
    WHERE  pt.type = 'import'); 

what exactly did it do?

This depends. If FROM product_template WHERE type = 'import'; returns zero rows then you were lucky and this is equivalent to

update product_product 
set (write_date, default_code) = (LOCALTIMESTAMP, 'update') 
where product_tmpl_id = null; /*Never true*/

Probably you weren't lucky though and it did return at least one row. In this case you ran the equivalent of the following

update product_product 
set (write_date, default_code) = (LOCALTIMESTAMP, 'update') 
where product_tmpl_id = product_tmpl_id;

which is equivalent to

update product_product 
set (write_date, default_code) = (LOCALTIMESTAMP, 'update') 
where product_tmpl_id IS NOT NULL;

why took it so long to run?

I imagine as firstly it updated all rows in the table.

I'm not sure what the execution plans for this would be like in Postgres too.

In the worst case it might have been selecting all rows matching the where from product_template, passing in the correlated parameter, then performing DISTINCT on the result for each row in the outer product_product table.