PostgreSQL: Add or drop column under load give wrong number of columns errors

ddlpostgresqlpostgresql-9.3

I have a heavily used PostgreSQL database. Sometimes I need to add/remove columns, preferably without any service interruptions.

I follow the safe operations list from https://www.braintreepayments.com/blog/safe-operations-for-high-volume-postgresql but many operations cause troubles anyway when the more busy tables are updated.

Typically we have user defined functions for all operations, which are run in the following manner:

Table and function definitions:

create table a(
    id serial primary key,
    x integer
);

create or replace function select_a() returns setof a AS
$$
begin
    return query
    select a.* from a;
end;
$$ language plpgsql;

Then the actual queries are run by our application as

SELECT id FROM select_a();

However, if I add a column with ALTER TABLE users ADD COLUMN y text; while the system is under load i sometimes (more frequently and persistent the more load the system is experiencing) get errors like these

ERROR #42804 structure of query does not match function result type: Number of returned columns (2) does not match expected column count (3).

Can this be avoided somehow, or do I need to take the system offline during these kind of changes?

To recreate this follow these steps:

  1. create the above table and function as give above
  2. Create one file loop_alter.sql

    #!/usr/bin/env bash
    for i in {0..1000}; do
    echo "alter table a add column y text; alter table a drop column y;"
    done;
    
  3. Create one file loop_select.sql

    #!/usr/bin/env bash
    for i in {0..100000} do
    echo "select * from select_a() limit 1;"
    done;
    
  4. Run the two files simultaneously with psql

    In one terminal: ./loop_alter.sql | psql

    In another: ./loop_select.sql | psql

Best Answer

If your function truly uses SELECT * then I would suggest switching to using an explicit list of fields and then separate out the amendment of the table and the function.

ADDING COLUMNS: Alter the table first, then change the function once the table is complete.

DELETING COLUMNS: Change the function first and then the table. If the table is large and under heavy load I'd still expect problems.