I have a ~320 million row table on which I run the following query:
UPDATE my_table SET state = TRIM(state)
Needless to say this has been running for 90+ hours (since thursday afternoon) and has not finished yet.
I'd like to know, more or less, how many rows it has modified until now.
Is there any way to obtain this information from the running transaction?
Trying to search I have found how to determine a rough estimate of how many new rows are being added to a table using the pgstattuple
function:
select dead_tuple_count from pgstattuple('my_table');
But this does not seem to show any interesting information for how many rows have been modified.
Best Answer
There is at least one way to peek at the progress of an uncommitted
UPDATE
, though it's a bit heavy-handed.Postgres handles transaction isolation through row versioning. Their implementation involves tagging every record version with the smallest and largest transaction IDs which are allowed to see it (
xmin
andxmax
, respectively).Under this scheme, an
UPDATE
works by setting thexmax
of the target record to the current transaction ID (equivalent to aDELETE
) and creating an updated copy with the transaction ID inxmin
(equivalent to anINSERT
).These system columns can be queried, so given the transaction ID of the
UPDATE
(which you can get frompg_stat_activity.backend_xid
), you can find out how many rows it's processed with e.g.:Things get a bit messier if the transaction has set any savepoints, in which case the
xmax
will be a subtransaction ID, which doesn't appear inpg_stat_activity
(or anywhere else, as far as I'm aware). In that case, you can inspect all rows which have been marked for update/deletion, by either in-progress or rolled-back transactions, with:... and from there, it shouldn't be too hard to figure out which ID is the one you're interested in.