Postgresql – Postgres performance difference on insert

performancepostgresqlquery-performance

I was tried to partition one large table, and come with such Postgres behavior that I could not explain. Maybe you will have any suggestions?

I went quite straight forward way (i know that it is not most efficient, but still i had some time):

create table target_table  as select * from parent_table limit 0;
insert into target_table select * from parent_table where date_field >= '20120101' and date_field <='20120131' and tariff_id <> -1;
insert into target_table select * from parent_table where date_field >= '20120201' and date_field <='20120228' and tariff_id <> -1;
insert into target_table select * from parent_table where date_field >= '20120301' and date_field <='20120331' and tariff_id <> -1;
insert into target_table select * from parent_table where date_field >= '20120401' and date_field <='20120430' and tariff_id <> -1;
insert into target_table select * from parent_table where date_field >= '20120501' and date_field <='20120531' and tariff_id <> -1;
.....

And there is results how much time each command took to execute:

08:44:42  [INSERT - 8981055 row(s), 119.962 secs]  Command processed
08:46:36  [INSERT - 8222656 row(s), 114.057 secs]  Command processed
08:48:43  [INSERT - 8981454 row(s), 126.789 secs]  Command processed
08:51:02  [INSERT - 8929325 row(s), 139.815 secs]  Command processed
08:53:13  [INSERT - 9465383 row(s), 130.752 secs]  Command processed
08:55:25  [INSERT - 9302664 row(s), 131.911 secs]  Command processed
08:57:36  [INSERT - 9581798 row(s), 130.853 secs]  Command processed
08:59:58  [INSERT - 10080875 row(s), 142.071 secs]  Command processed
09:15:03  [INSERT - 9717698 row(s), 905.030 secs]  Command processed
10:24:22  [INSERT - 10169181 row(s), 4159.346 secs]  Command processed
11:33:11  [INSERT - 10000601 row(s), 4128.590 secs]  Command processed
12:38:36  [INSERT - 10576846 row(s), 3924.579 secs]  Command processed

I can't find the explanation why sometime inserts from the same table took 2 minutes and sometimes it took 1 hour! What could be reasons for such behavior?

Some info:

PG server running on amazon cloud xlarge EC2 instance.
I am the only one user and it was not overloaded with other tasks.
work_mem = '1GB'
parent_table is pretty large – 800 million of rows.

Best Answer

Very likely your problem is that you have an index on the source table which is making the first hits easy. However at some point you are hitting rows that are not in the cache and so this is requiring a sudden increase in disk I/O accounting for the performance drop. In essence suddenly you start missing the cache and so you end up having to read a bunch of things from disk again.