PostgreSQL – Slow Updates When Other Columns Are Indexed

indexperformancepostgresqlpostgresql-performanceupdate

Certain updates take far too long on large Postgres tables. Given these conditions:

  • only one column is being updated, and it is non-indexed
  • the column already has data in it for every row due to a prior update
  • the data is not changing size (e.g., rewriting a boolean value)
  • there is no other column in this or any other table that depends on the value of the column being updates
  • there are no other queries being performed on the database (this is a personal research database on a workstation, not an enterprise database)
  • there are indexes on other columns
  • spinning drive (not SSD) with Bitlocker and fast PC with Windows 8.1 x64
  • the table has 10 million rows and 60 columns

…you'd think that the update would take a reasonable amount of time relative to expectations of spinning media with Bitlocker. We're not creating more data, so existing data shouldn't need to be moved around on the HDD, it just needs to be overwritten. The other indexes shouldn't need changes. Etc. Instead, after 20 hours of constant HDD grinding, I get tired of waiting and stop the query. If I drop all indexes on other columns and re-run the query, it takes only about 30 minutes.

Why do indexes on columns irrelevant to this query balloon the update time?

Best Answer

We're not creating more data, so existing data shouldn't need to be moved around on the HDD, it just needs to be overwritten

That's not the case.

In order to support rollback and crash-safety, PostgreSQL must write a new copy of every modified row, rather than modifying the row in-place. Twice, actually, because it must be written to WAL (a sequential log for crash recovery) then to the table.

PostgreSQL's autovacuum then comes along later and marks the old row versions as free space that can be re-used.

See the user manual for more information on concurrency control and MVCC.

the data is not changing size (e.g., rewriting a boolean value)

Irrelevant, because the row must be rewritten anyway.

only one column is being updated, and it is non-indexed
there are indexes on other columns

This only matters insofar as it affects HOT updates, where PostgreSQL can potentially avoid writing new index entries for a row update if no indexed columns are modified and there's enough free space on the same disk page (8k block) to store a new copy of the row.

Why do indexes on columns irrelevant to this query balloon the update time?

In most cases PostgreSQL must add new index entries even if you didn't modify indexed columns, because it has to write a new version of the row to a separate database page.

Setting a FILLFACTOR of 50 or less can help with this, as it encourages PostgreSQl to keep more free space for updates, at the cost of scans having to read and process more data.

Related Question