I may be a newbie question but I don't know how to resolve this. I have a table like:
name | id | value
A 1286487 1286333
B 1286489 1286403
C 1286495 1286455
C 1286496 1286375
D 1286503 1286341
B 1286506 1286343
And I want to update this table to make it like this:
name | id | value
A 1286487 1286333
B 1286489 1286403
C 1286495 1286455
C 1286495 1286375
D 1286503 1286341
B 1286489 1286343
So rows with name B and C have the same id like the first row with this name. Can anyone help me with that?
Best Answer
To fulfil the requirements, we have to first find the ID that we need - this can be found using the
min()
function, not surprisingly. Then we do the update - the quickest to write is possibly the following:The backside of this is it can be very slow, if you have a bigger number of rows. I have populated the table with some 50,000 random rows, using the following statement:
On my test box, without an index, this took looong minutes - I did not wait for it to produce an actual execution plan (using
EXPLAIN (ANAYLZE, BUFFERS)
, but cancelled and run it withEXPLAIN
to see what's wrong:So I've added an index in (name, id) to get some faster results:
Now the problem is this: the inner part of the plan is executed 50,000 times - the number of rows to be updated. In the index-less version, it would scan the whole table this may times - no wonder it would take very long.
But there is more to it. Those 50,000 loops are ugly - let's find something nicer:
This gives a more streamlined plan:
There are now two sequential scan nodes there (somewhat surprisingly), but both are executed only once (
loops=1
).What I've done here is removing a correlated subquery, replacing it by a temporary view (the query after
FROM
), joining it to the table to be updated. This approach often gives far better plans than the easy way. On big tables the performance difference can be quite significant.