PostgreSQL – How to Set Table to LOGGED After Data Loading

postgresqlpostgresql-9.6unlogged-tables

I have created an empty UNLOGGED table to faster copy a large amount of data (over 1 Billion rows). Loading the data took around 4 hours.

Now I want to set the table to LOGGED to make it safe for unexpected shutdowns and crashes. This process takes a long time. In fact it takes longer than loading the data. Is this normal, or is there a way to speed it up?

Best Answer

Problem

I believe currently SET LOGGED rewrites the table using the WAL (essentially doing the whole operation), and rewrites the indexes.

So I found a thread about this on the lists

A new relfilenode is filled with the data - the old one, including the init fork, gets removed by the normal mechanics of rewriting rels.

There was a long thread about it on -hackers. Doing it without a rewrite and without loosing transactional semantics is really rather hard. And having the capability of doing it with a rewrite is better than not having it at all.

You can see the patch that added the SET (LOGGED|UNLOGGED). The implementation hasn't changed much, though there was a plan to fix it that acknowledged the problems

this design lead us to performance problems with large relations because we need to rewrite the entire content of the relation twice, one into a new heap and other into the WAL, so this project will change the current desing of the mechanism of change an unlogged table to logged without the need to rewrite the entire heap, but just by removing the init forks and if the wal_level != minimal we'll write the contents to the WAL too.

But it seems no more work was done. Looking at the code you can see it,

case AT_SetLogged:        /* SET LOGGED */
    ATSimplePermissions(rel, ATT_TABLE);
    tab->chgPersistence = ATPrepChangePersistence(rel, true);
    /* force rewrite if necessary; see comment in ATRewriteTables */
    if (tab->chgPersistence)
    {
        tab->rewrite |= AT_REWRITE_ALTER_PERSISTENCE;
        tab->newrelpersistence = RELPERSISTENCE_PERMANENT;
    }

Checking that comment,

 * There are two reasons for requiring a rewrite when changing
 * persistence: on one hand, we need to ensure that the buffers
 * belonging to each of the two relations are marked with or without
 * BM_PERMANENT properly.  On the other hand, since rewriting creates
 * and assigns a new relfilenode, we automatically create or drop an
 * init fork for the relation as appropriate.

So you can see, that a rewrite is still required. I guess.

Potential solution

You may be better of copying all of the data into the table on the same transactions that creates the table. Which disables the rewrite, and is mentioned in the docs.

In minimal level, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster (see Section 14.4.7). Operations in which this optimization can be applied include:

COPY into tables that were created or truncated in the same transaction

This would skip the WAL write and heap rewrite.

Best Answer

Problem

Potential solution

Related Solutions

PostgreSQL – Storing Historical Data in Database vs On Disk

PostgreSQL – Fast Filtering on Continuously Arriving Data by Date

Related Question