Postgresql – Index Generation in Postgres

disk-spaceindexpostgresql

We have a database with a single table made up of around 700 million entries.
We update the database by adding new entries on a server then transfer the server to the production server using pg_dump:

pg_dump -c database > /tmp/database_gen

(by the way we use postgres 8.4)
we export the database to the production server by using psql.
The pg_dump generated file has instruction on how to create and fill the table.

The problem is with the index creation. Postgres fills the table then spend days creating the index. That was OK, until postgres couldn't create the index anymore due to the fact that it has no more diskspace because it uses a lot of disk space to for the temporary files for sorting and creating the index. Normaly the database takes around 200GB but during the index creation the used disk space increases to 600GB then after the creation it goes back to 200 GB.

My question is : can we create the index in several steps, like create the index for half the table then add the rest of the table and update the index?

Has anyone had the same issues ?

Thanks

Best Answer

If you create the index before loading the table, the time taken to load the data will be significantly increased.

pre load:

create table my_table1(val integer);
create index my_index1 on my_table1(val);
insert into my_table1(val) select generate_series(1,100000) order by random();
Time: 31755.858 ms

post load:

create table my_table2(val integer);
insert into my_table2(val) select generate_series(1,100000) order by random();
Time: 15344.130 ms
create index my_index2 on my_table2(val);
Time: 4073.686 ms

If you are ok with that, with pg_restore you can:

  1. Load just the schema using --schema-only
  2. Create the index with --index
  3. Load the data using --data-only

Of course "Buy more storage" may well be the best answer here...