Postgresql – may i reduce the restore time with pgsql 9.2

pg-restorepostgresql

I need your help about how to reduce time to restore process for a big dump (400GB)

We have a database that takes weeks to restore due to the single threaded nature of index generation on PostgreSQL. Please i need your help to figure out a way that we can restore this quicker, maybe by disabling index generation at load time and adding the indexes back later or some better trick.

sample pg_dump command:

pg_dump --compress=0 -bo -F c --lock-wait-timeout=1500 -h $HOST -p $PORT $DBNAME | lbzip2 > $DB-$TIMESTAMP.bz2

sample pg_restore command:

pg_restore -Ov -j 2 -h $HOST -p $PORT --dbname=$DBNAME $RESTOREFILE

-j option does not help us since it helps the part that doesn't take long (backup restore) but not the part that does take a long time (index generation) – this ticket is to figure out a way to streamline the process so index generation is done separately on an already working database or to speed up index generation, on PG 9.2

I'd like a clear procedure for removing index generation from the restoration process and doing it afterwards, to not block usage of the DB.

Is possible do that? what is your opinion about that?

Best Answer

pg_restore has an option to run the time-consuming parts of the restore, such as the index rebuild process, with multiple "jobs".

From the pg_restore documentation for PostgreSQL 9.2:

-j number-of-jobs --jobs=number-of-jobs

Run the most time-consuming parts of pg_restore — those which load data, create indexes, or create constraints — using multiple concurrent jobs. This option can dramatically reduce the time to restore a large database to a server running on a multiprocessor machine.

Each job is one process or one thread, depending on the operating system, and uses a separate connection to the server.

The optimal value for this option depends on the hardware setup of the server, of the client, and of the network. Factors include the number of CPU cores and the disk setup. A good place to start is the number of CPU cores on the server, but values larger than that can also lead to faster restore times in many cases. Of course, values that are too high will lead to decreased performance because of thrashing.

Only the custom archive format is supported with this option. The input file must be a regular file (not, for example, a pipe). This option is ignored when emitting a script rather than connecting directly to a database server. Also, multiple jobs cannot be used together with the option --single-transaction.