PostgreSQL Update Statement – How It Updated 3 Rows?

postgresqlrandomupdate

I am curious how this statement updated 3 rows in Postgres. All the other times I ran it, it would update 0 or 1. Is there a way to find out which rows?

bestsales=# update keyword set revenue = random()*10 where id = cast(random()*99999 as int);
UPDATE 3

id is the primary key.

 id               | integer                        | not null default nextval('keyword_id_seq'::regclass)
    "keyword_pkey" PRIMARY KEY, btree (id)

I tried to run it as SELECT:

bestsales=# select * from keyword where id = cast(random()*99999 as int);
  id   |       keyword       | seed_id | source | search_count | country | language | volume | cpc  | competition | modified_on | google_violation | revenue | bing_violation
-------+---------------------+---------+--------+--------------+---------+----------+--------+------+-------------+-------------+------------------+---------+----------------
  6833 | vizio m190mv        |         | GOOGLE |            0 |         |          |     70 | 0.38 |        0.90 |             |                  |         |
 65765 | shiatsu massage mat |         | SPYFU  |            0 |         |          |    110 | 0.69 |             |             |                  |         |
 87998 | granary flour       |         | SPYFU  |            0 |         |          |     40 | 0.04 |             |             |                  |         |
(3 rows)

And sometimes it would return more than one. How is that possible?

PostgreSQL 9.5.3

Best Answer

Even a simple subquery does the job:

UPDATE keyword
SET    revenue = random() * 10
WHERE  id = (SELECT (random() * 99999)::int);

The subquery is evaluated once and only one row is updated (if the id value exists). A bit cheaper than using a CTE.

If you use the expression without subquery wrapper, random() is evaluated for every row, because it is a volatile function. That's by design and often the desired behavior.

As for your additional question:

Is there a way to find out which rows?

Not with guaranteed safety, but if you execute this query immediately after the UPDATE, the rows sharing the highest transaction ID in xmin value are the ones most recently updated or inserted.

All rows inserted / updated in the same transaction share the same transaction ID. If you have multiple INSERT / UPDATE operations to the same table in one transaction you need to look at the command ID cmin additionally:

SELECT *, xmin, cmin
FROM   keyword
ORDER  BY xmin::text::bigint DESC;

There are no guarantees because xmin is subject to possible transaction ID wraparound. But it normally works.

xmin and cmin are system columns in the row header.
The manual about object identifier types.

While still in the same transaction (update not committed yet), there is an even safer method:

Related: