Postgresql – Best way to ALTER date column to timestamp on a large table

lockingperformancepostgresqlpostgresql-performance

I have to ALTER a date field to timestamp on a huge table having 1 billion records.

I am using Postgres 9.5. I learnt that when we use alter query a heavy “access exclusive” lock is acquired, which shuts everything else out of the table.

How can we achieve this efficiently?

Best Answer

How to do it efficiently and how to do it without blocking everyone else for the duration are diametrically opposed goals. The efficient way is to schedule a downtime window in which to do it.

If that can't be done, then:

  • Add a new column of the desired type, with a NULL default value.

  • Change all the code which changes that old column or inserts new rows to make sure they correctly populate the new column as well (but make sure they don't depend on the new value already being correct--they just make it be correct)

  • Create an index to efficiently find rows where the new column is still NULL

  • Write a script to update rows with NULL values of new column to set them to be correct based on the value of the old column. Do this in chunks that are as large as feasible, given that the first row in the chunk will be locked for the duration of the processing of the chunk.

  • Once all chunks of this are done and verified, change all the code to rely on the new column rather than the old column.

  • Change all the code to stop trying to maintain the old column

  • Drop the old column

  • Drop the index that supports finding still-NULL values of the new column (unless it is still useful)