why didn`t the first query fail?
Because this is valid SQL.
First of all consider that it is valid to run a SELECT
against the table that doesn't reference any columns from that table.
SELECT DISTINCT 'foo'
FROM product_template
WHERE type = 'import';
The above will return a single row result of foo
if any rows exist that match the WHERE
clause.
Secondly consider that it is valid to reference a column from the outer table in a sub query (this is required for correlated sub queries to work).
In the event that the column name is not qualified with a table reference then it will be resolved in the inner scope if possible and outer scopes if not. In your case product_template
has no such column so it was resolved as belonging to product_product
.
As a best practice be explicit about the tables the columns belong to. Had the subquery been written as below it would have failed to compile and alerted you of the error.
IN (SELECT DISTINCT pt.product_tmpl_id
FROM product_template pt
WHERE pt.type = 'import');
what exactly did it do?
This depends. If FROM product_template WHERE type = 'import';
returns zero rows then you were lucky and this is equivalent to
update product_product
set (write_date, default_code) = (LOCALTIMESTAMP, 'update')
where product_tmpl_id = null; /*Never true*/
Probably you weren't lucky though and it did return at least one row. In this case you ran the equivalent of the following
update product_product
set (write_date, default_code) = (LOCALTIMESTAMP, 'update')
where product_tmpl_id = product_tmpl_id;
which is equivalent to
update product_product
set (write_date, default_code) = (LOCALTIMESTAMP, 'update')
where product_tmpl_id IS NOT NULL;
why took it so long to run?
I imagine as firstly it updated all rows in the table.
I'm not sure what the execution plans for this would be like in Postgres too.
In the worst case it might have been selecting all rows matching the where
from product_template
, passing in the correlated parameter, then performing DISTINCT
on the result for each row in the outer product_product
table.
Your query is needlessly complicated and can be untangled to this 100 % equivalent one:
SELECT gid, id, ST_MakeValid(geom) AS geom
FROM schema.table_polygons
WHERE geom IS NOT NULL;
Neither original nor this one remove any duplicates (except by possibly removing some with geom IS NULL
, but I doubt that was your intention).
Possible misunderstanding
There is a common misunderstanding concerning output column names in the WHERE
clause - where you can only reference input column names. So, geom
in:
WHERE geom IN ...
references the original input column not the output column (the result of ST_MakeValid(geom)
).
Related:
If your actual intention was to only keep rows where the result of ST_MakeValid(geom)
is equal to any currently existing geom
entry:
SELECT gid, id, ST_MakeValid(t.geom) AS geom
FROM schema.table_polygons t
WHERE EXISTS (
SELECT 1
FROM schema.table_polygons
WHERE geom = ST_MakeValid(t.geom)
);
I added a table alias and table qualification to clarify what's what.
The index you mention is only going to help with this variant.
If your actual intention was to merge duplicate rows after fixing geom
values:
SELECT DISTINCT ON (ST_MakeValid(t.geom))
gid, id, ST_MakeValid(geom) AS geom
FROM schema.table_polygons t
ORDER BY ST_MakeValid(t.geom), gid, id;
You did not define how to break ties and which row to keep from a set of dupes. I keep the row with the smallest gid
from each set. And the smallest id
from remaining dupes. Details:
Aside: Calling a schema "schema" is like naming your son "son" and hoping for the best that no more sons shall arrive. In other words: don't.
Best Answer
Postgres 11 added a feature where a table rewrite is not necessary if you add a column with a default value.
If you had used
the statement would have executed nearly instantaneous with the same effect.
But an ALTER TABLE statement will acquire an exclusive lock to prevent access to the table while it's being changed, there is no way around it.