Postgresql – About “Transaction ID Wraparound”

postgresql

Now , I read the document about "Transaction ID Wraparound " , but there are something that I really
don't understand, the document is the following url
http://www.postgresql.org/docs/9.0/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

23.1.4. Preventing Transaction ID Wraparound Failures

PostgreSQL's MVCC transaction semantics depend on being able to compare transaction ID (XID) numbers: a row version with an insertion XID greater than the current transaction's XID is "in the future" and should not be visible to the current transaction. But since transaction IDs have limited size (32 bits) a cluster that runs for a long time (more than 4 billion transactions) would suffer transaction ID wraparound: the XID counter wraps around to zero, and all of a sudden transactions that were in the past appear to be in the future — which means their output become invisible. In short, catastrophic data loss. (Actually the data is still there, but that's cold comfort if you cannot get at it.) To avoid this, it is necessary to vacuum every table in every database at least once every two billion transactions.

I don't understand the statements "would suffer transaction ID wraparound: the XID counter wraps around to zero, and all of a sudden transactions that were in the past appear to be in the future — which means their output become invisible"

Can someone explain this ? Why after the database suffers transaction ID wraparound would transactions that were in the past appear to be in the future ? In short, I want to know if the PostgreSQL will in the " data loss" situation after transaction ID wraparound by autovacuum。

For my personal views, we can get the current transaction ID by using txid_current() function whoes output is 64 bit and will not be cycled.So I think the Insertion transaction ID of tuples which knows as xmin will nerver greater than the xid which get by txid_current() function. Except that you will use pg_resetxlog reset reset transaction ID after shuting down PostgreSQL Server. Am I right ? Thanks

Best Answer

Why after the database suffers transaction ID wraparound would transactions that were in the past appear to be in the future ?

They don't. The quoted text just explains why postgres needs to use modulo 231 arithmatic (which means transactions can wrap around as long as old transactions are 'frozen' early enough):

Normal XIDs are compared using modulo-2^31 arithmetic. This means that for every normal XID, there are two billion XIDs that are "older" and two billion that are "newer"

to be specific:

old row versions must be reassigned the XID FrozenXID sometime before they reach the two-billion-transactions-old mark

or wrapping the XID around would cause things to break. To prevent that, postgres will start to emit warnings, and eventually shut down and refuse to start new transactons if necessary:

If for some reason autovacuum fails to clear old XIDs from a table, the system will begin to emit warning messages like this when the database's oldest XIDs reach ten million transactions from the wraparound point:

WARNING:  database "mydb" must be vacuumed within 177009986 transactions 
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in "mydb". 

(A manual VACUUM should fix the problem, as suggested by the hint; but note that the VACUUM must be performed by a superuser, else it will fail to process system catalogs and thus not be able to advance the database's datfrozenxid.) If these warnings are ignored, the system will shut down and refuse to start any new transactions once there are fewer than 1 million transactions left until wraparound

In other words "transactions that were in the past appear to be in the future" and "data loss" are entirely theoretical and will not be caused by transaction ID wraparound in practice.