Postgresql – How to speed up an insertion from a huge table with postgres

bigtableinsertpostgresqlselect-into

I have 5 tables in my database with respectively a size of 70Gb, 500Mb, 400 Mb, 110Mb and 20 Mb.

I want to create a new table that contains all columns of all tables, so I tried 2 queries, the first one is :

select into new_table as select .. from t1 , t2 , t3 , t4 ,t5 where t1.id2 = t2.id and t1.id3 = t3.id and t1.id4 = t4.id and t1.id5 = t5.id 

and the second one is :

insert into new_table select .. from t1 , t2 , t3 , t4 ,t5 where t1.id2 = t2.id and t1.id3 = t3.id and t1.id4 = t4.id and t1.id5 = t5.id 

Before executing these two queries on my big data tables, I tried both on a total 1G database, the first on took only 7s and the second one approximately 10 mn.

Now, executing the first one on my huge database, made my disk full even though I had 250Gb free space before running the query, and without finishing the query so I got the follow error :

ERROR:  could not write to temporary file: No space left on device

The second one, is taking a lot of time and consuming my free disk space slowly and, as the first one, not returning the result.

What are the difference between these two queries ? Is there a way to make the insert into non transactional so as I can follow my insert steps. And I guess Postgres uses logs (journalization) so is there a way to deactivate that in order to speed up the insertion ? or I should follow another method in order to get a desired result without filling up all disk.

Ps : No triggers, only a primary key on each table.

Best Answer

.. a new table that contains all columns of all tables ..

Really? That's unlikely to be useful.

select .. from t1 join t2 join t3 join t4 ...

Without "on" clauses to tell postgres how to put the tables together, those 'join's will link every row in t1 to every row in t2 and then every row of that to every row in t3 and then every row of that to every row in t4 ...

That's an awful lot of "rows".

As always, build a select statement that returns the data that you want, then just add the "insert into" bit on the front of it.

If your tables are all the same structure, I'd suspect you'll be better served by 'union [all]'s instead of 'join's but YMMV; we can't tell because you haven't given us the structure of your tables or your intended result.