PostgreSQL – Difference Between Tuples Inserted and Live Tuples

pgadminpostgresql

Many apologies in advance, but I was not able to find this out myself:

What is the difference between live tuples and tuples inserted in postgres statistics?

There's this post, but it doesn't answer the question: pg_stat_get_live_tuples are Representing the number of live [...] rows (tuples) in the table. But what means "live"?

The Postgres documentation doesn't provide answers either:

  • pg_stat_get_live_tuples(oid) Number of live rows in table
  • pg_stat_get_tuples_inserted(oid) Number of rows inserted into table

Why do my stats show 213,324,422 inserted tuples, but only 124,510,280 live tuples? (3,087,919 dead tuples)

The PG Admin documentation is also unclear about this. It further says that Tuples Inserted displays the number of tuples inserted into the database within the last week.

I would like to know the total number of existing entries in my database. Is this live rows or inserted rows?

Best Answer

This nomenclature all comes down to knowing about MVCC, or Multi-Version Concurrency Control, which is how PostgreSQL handles transactions and concurrency.

When you update or delete a row, doesn't actually remove the row. When you do an UPDATE or DELETE, the row isn't actually physically deleted. For a DELETE, the database simply marks the row as unavailable for future transactions, and for UPDATE, under the hood it's a combined INSERT then DELETE, where the previous version of the row is marked unavailable. These new versions of rows are generally referred to as the "live" rows, and the older versions are referred to as "dead" rows.

The statistics that you are looking at, where it shows 213,324,422 inserted tuples are the number of new data inserts that has been done on that table. The 124,510,280 live tuples are the number of tuples that are up to date in the table and available to be read or modified in transactions. The 3,087,919 dead tuples are the number of tuples that have been changed and are unavailable to be used in future transactions. The autovacuum daemon, or a manual vacuum will eventually come along and mark the space of those "dead" tuples available for future use, which means that new INSERTS can overwrite the data in them.

Side note, since these are just statistics that are gathered by the statistics collector, they aren't going to be exact, but they're good enough to give you an idea of how active the table is, and how much bloat there is (live vs dead tuples), and how well the autovacuum daemon is keeping up with your workload.

Hopefully this makes it clearer. You can read more about this in detail in the PostgreSQL Manual in the chapter on Concurrency Control