Postgresql – Stripping OIDs from tables in preparation for pg_upgrade

postgresql

I have a postgres database in RDS, file size approaching 1TB. We started in 2005, using ruby/activerecord/rails, and along the way have upgraded to PG 9.6

Rails migrations create tables using CREATE TABLE, never specified whether to include OIDs or not, but never used them. So some of our oldest (and largest) tables have OIDs. At some point postgres stopped defaulting to creating tables with OIDs, so tables created more recently don't have this issue.

We're looking to do an upgrade 9.6 -> 12, ideally using pg_upgrade. This fails complaining about tables having OIDs.

We can do ALTER TABLE :table_name: SET WITHOUT OIDS but for our larger tables that takes several hours, locking the table, effectively taking down the database. We would prefer to avoid the downtime if possible.

Is it expected that an ALTER TABLE :table_name: SET WITHOUT OIDS should rewrite all the rows of a table?

Is there a way to avoid this rewriting? (Ordinary columns can be dropped without rewriting all the rows, for example.)

I tried on a toy database mucking with the metadata tables:

UPDATE pg_catalog.pg_class SET relhasoids = 'f' WHERE oid = (
  SELECT c.oid FROM pg_catalog.pg_class c
    JOIN pg_namespace n ON n.oid = c.relnamespace
   WHERE n.nspname = 'public' AND c.relname = 'name_of_table'
);

It executed quickly and a cursory examination of the data didn't show any corruption.

But, this doesn't seem to be a documented approach.

Is this a terrible idea? Are there other approaches that don't require rewriting whole tables?

Best Answer

That UPDATE might work, but I am not certain, and I wouldn't want to do that with data I value. You don't have the option anyway, since you cannot get superuser access in a hosted database.

I can think of a safe, but more painful method:

Briefly suspend data modification activity on the table.
Create a trigger that records all data modification activity in another table.
Create a new table that looks like the old one, and start an INSERT INTO ... SELECT ... that copies the data.

Now normal operation can be resumed.

Once the copy is done, replay the recorded changes. Then

Start a transaction.
LOCK the original table.
Replay all the changes that happened since the last replay (should be few).
DROP the original table.
Rename the copy to the original table name.

Foreign keys will require extra attention.

Related Solutions

Mysql – group by clause without aggregate function

This:

SELECT users.* FROM users
INNER JOIN timesheets ON timesheets.user_id = users.id
WHERE (timesheets.submitted_at <= '2010-07-06 15:27:05.117700')
GROUP BY users.id

Finds all users who have a timesheet submitted on or before the given date. It's equivalent to:

SELECT DISTINCT users.* FROM users
INNER JOIN timesheets ON timesheets.user_id = users.id
WHERE (timesheets.submitted_at <= '2010-07-06 15:27:05.117700');

or:

SELECT  users.*
FROM users
WHERE EXISTS (
    SELECT 1
    FROM timesheets 
    WHERE timesheets.user_id = users.id
    AND timesheets.submitted_at <= '2010-07-06 15:27:05.117700'
);

It works because users.id is the primary key, so all other fields of users are functionally dependent on it. PostgreSQL knows that you don't have to use an aggregate to guarantee a single unambiguous result for each field in a row because there can only be one candidate users.name or whatever for any given users.id row.

(Older PostgreSQL versions didn't know how to identify functional dependencies of the primary key and and would throw an ERROR about needing to use an aggregate or include the field in the GROUP BY here).

Does Changing VARCHAR Length Limit Result in Table or Index Rewrite in PostgreSQL?

Essentially, this is out of date information. It hasn't been relevant since 9.2. Now, the only drawback, that I can see, is that the index gets rewritten if the length constraint gets more restrictive and it has to be rechecked.

In the release notes for 9.1, as discovered by Erwin it states

Allow ALTER TABLE ... SET DATA TYPE to avoid table rewrites in appropriate cases (Noah Misch, Robert Haas)

For example, converting a varchar column to text no longer requires a rewrite of the table. However, increasing the length constraint on a varchar column still requires a table rewrite.

In the release notes in 9.2, as discovered by @a_horse_with_no_name found,

Reduce need to rebuild tables and indexes for certain ALTER TABLE ... ALTER COLUMN TYPE operations (Noah Misch)

Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite. Similarly, increasing the allowable precision of a numeric column, or changing a column from constrained numeric to unconstrained numeric, no longer requires a table rewrite. Table rewrites are also avoided in similar cases involving the interval, timestamp, and timestamptz types.

The docs on ALTER have this to say

Adding a column with a DEFAULT clause or changing the type of an existing column will require the entire table and its indexes to be rewritten. As an exception when changing the type of an existing column, if the USING clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed; but any indexes on the affected columns must still be rebuilt. Adding or removing a system oid column also requires rewriting the entire table. Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space.

Following with the test case provided by a_horse_with_no_name that I modified, let's see this in action.

\timing 1

CREATE TABLE alter_test (id int primary key, some_data varchar(50));
INSERT INTO alter_test
  SELECT i, md5(i::text)
  FROM generate_series(1,1e7) AS gs(i);

ALTER TABLE alter_test
  ALTER COLUMN some_data
  TYPE varchar(55);
Time: 5.671 ms

So we have no slow down without an index. Then we add an index, and try it again,

CREATE INDEX ON alter_test (some_data);
ALTER TABLE alter_test
  ALTER COLUMN some_data
  TYPE varchar(55);
Time: 6.423 ms

I tried VACUUM FULL ANALYZE on the table and making the length on varchar again longer, and it still didn't take any more time. Not saying it's well debunked in all cases, but at least in the simple cases even when indexed this seems to be not-a-concern if you're going to make the constraint less restrictive. However, making the length constraint more restrictive seems to be doing something.

ALTER TABLE alter_test
  ALTER COLUMN some_data
  TYPE varchar(50);
ALTER TABLE
Time: 59690.885 ms

Dropping the index and trying that again is substantially faster,

DROP INDEX alter_test_some_data_idx ;
DROP INDEX
Time: 85.978 ms
test=# ALTER TABLE alter_test
  ALTER COLUMN some_data
  TYPE varchar(49);
ALTER TABLE
Time: 9297.271 ms

So it seems only if the length constraint is made more restrictive and has to be revalidated does the index get rewritten.

Best Answer

Related Solutions

Mysql – group by clause without aggregate function

Does Changing VARCHAR Length Limit Result in Table or Index Rewrite in PostgreSQL?

Related Question