Postgresql 11 – slow insert performance when table reaches ~5 million records

performancepostgresqlpostgresql-11postgresql-performance

I have a table with ~6 million records that gets loaded on a postgresql instance in a small machine (with standard PC HDD).

It seems to be a I/O problem, since there are free memory and processing power.

The table has an UUID (random, v4) primary key and gets the records from a central server – it's a local copy.

Loading the table starts very fast (~ 1000 records each 500ms) and as the table fills with records it gets very slow (~1000 records each 30s).

First thought was "indexes", so I dropped all indexes on the table (kept the primary key) – very little improvement.

AFAIK, postgresql does not have a clustered index like SQL Server which might cause page splits on random UUID inserts.

What else might be a problem?

UPDATE:
The same process and settings are not a problem on a machine with SSD instead of HDD (cannot change this on 3000+ machines).

Settings:

  • shared_buffers = 256MB
  • work_mem = 10MB
  • maintenance_work_mem = 256MB
  • dynamic_shared_memory_type = posix
  • effective_io_concurrency = 2
  • max_worker_processes = 2
  • max_parallel_workers_per_gather = 1
  • max_parallel_workers = 2
  • wal_level = minimal
  • max_wal_size = 1GB
  • min_wal_size = 100MB
  • checkpoint_completion_target = 0.5
  • max_wal_senders = 0
  • random_page_cost = 4.0
  • effective_cache_size = 1GB
  • autovacuum = on
  • autovacuum_max_workers = 1

EDIT:

Did some updates to improve performance:

  • added noatime to /etc/fstab (ext4 partition);
  • disabled disk write caching;
  • disabled autovacuum and created a systemd service to do it on shutdown;
  • set effective_io_concurrency = 1 (the machine was very slow due to high I/O);
  • changed to UUIDv1 – so it would be sequential and avoid rotation on the primary key index b-tree or any other b-tree;
  • partitioned the tables with hash technique (tried with 10 partitions and 50 partitions);
  • did proper fillfactor configuration: fillfactor=75 / 85 depending on how much unordered data it gets;

No noticeable changes..

It fills the tables at ~5 million rows on 500ms to 1sec each 1000 rows and than starts getting very slow, at 1000 rows at 30s minimum.

Tried to REINDEX all to see if would be a filled index issue – no results.

EDIT2:

The tables:

create table customer (
    field1  serial      not null,
    field2  uuid        not null default (uuid_generate_v1()),
    field3  varchar(11) null,
    field4  varchar(2)  not null,
    field5  varchar(1)  not null,
    field6  varchar(40) not null,
    field7  varchar(40) null,
    field8  varchar(40),
    field9  bigint,
    field10 varchar(15),
    field11 varchar(20),
    field12 integer,
    field13 date,
    field14 varchar(1),
    field15 integer,
    field16 varchar(100),
    field17 varchar(100),
    field18 boolean,
    field19 integer,
    field20 varchar(20),
    field21 timestamp,
    field22 timestamp,
    field23 varchar(2),
    field24 varchar(40) not null,
    field25 varchar(40),
    field26 boolean,
    field27 boolean     null
) partition by hash (
    field1
);

-- Fillfactor = 85 because this table is a copy from the original so it might
-- eventually get unordered data
alter table customer
    add constraint customer_pk
        primary key (field1) with (fillfactor = 85);

-- CREATE 10 PARTITIONS...

create index idx_customer_01 on customer (field2) with (fillfactor =75);
create index idx_customer_02 on customer (field9, field14) with (fillfactor =75);
create index idx_customer_03 on customer (field25, field14) with (fillfactor =75);

create table customer_contacts (
    field1 serial,
    field2 uuid default (uuid_generate_v1()),
    field3 uuid    not null,
    field4 integer not null,
    field5 smallint,
    field6 bigint,
    field7 boolean not null
) partition by hash (
    field1
);

alter table customer_contacts
    add constraint customer_contacts_pk
        primary key (field1) with (fillfactor =85);

-- CREATE 10 PARTITIONS...

create index idx_customer_contacts_01 on customer_contacts (field2) with (fillfactor =85);
create index idx_customer_contacts_02 on  customer_contacts (field3, field4) with (fillfactor =85);

EDIT3:

There are no foreign keys and no triggers on the tables.

Best Answer

partition by hash was causing the problem. It causes "random" inserts that do not take advantage of caching/buffers.

Changing it to sequential/range partitions solved the problem.