Postgresql – System configuration to address poor PostgreSQL INSERT performance

configurationperformancepostgresqlpostgresql-10postgresql-performancer

Problem Symptom

High CPU usage by postmaster child process associated with client connections attempting to INSERT low-volume rows (results in rows INSERTed 25x slower than using COPY ... FROM STDIN for identical rows).

Background

Trying to identify system/ database configuration to alleviate aforementioned poor INSERT performance. I'm using a multi-threaded R script to process data and INSERT results back into a PostgreSQL database. I've profiled the R script to isolate performance bottleneck to the DBI::dbBind() calls, while using top to monitor the postmaster child process associated to the connections opened by the child R threads (see code below). During the INSERTs, R child processes run mostly idle (presumably waiting for return of DBI::dbBind() call) whereas the postmaster child processes consume 95-100% CPU on the cores on which they run for approximately 2-3 minutes.

System / environment:

  • postgresql version 10.3 (Fedora package 10.3-5.fc27)
  • uname -a : Linux localhost 4.16.6-202.fc27.x86-64 #1 SMP Wed May 2 00:09:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • /proc/cpuinfo : 16 processors (Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz)
  • ulimit -a :

    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 515220
    max locked memory       (kbytes, -l) 64
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 1024
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 8192
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 515220
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited
    
  • /proc/meminfo (at arbitrary time when postmaster processing INSERT):

    MemTotal:         131923484 kB
    MemFree:          112894260 kB
    MemAvailable:     123181440 kB
    Buffers:             201220 kB
    Cached:            14932288 kB
    SwapCached:               0 kB
    ...
    Mlocked:                  0 kB
    SwapTotal:        201326588 kB
    SwapFree:         201326588 kB
    Dirty:                 3260 kB
    ...
    Shmem:              4251184 kB
    Slab:               1024344 kB
    SReclaimable:        658476 kB
    SUnreclaim:          365868 kB
    ...
    PageTables:           38436 kB
    ...
    CommitLimit:      267288328 kB
    Committed_AS:      35678744 kB
    VmallocTotal:   34359738367 kB
    VmallocUsed:              0 kB
    VmallocChunk:             0 kB
    HardwareCorrupted:        0 kB
    ...
    ShmemHugePages:           0 kB
    ShmemPmdMapped:           0 kB
    ...
    HugePages_Total:          0
    HugePages_Free:           0
    HugePages_Rsvd:           0
    HugePages_Surp:           0
    Hugepagesize:          2048 kB
    Hugetlb:                  0 kB
    ...
    
  • Database tables in testing reside on pair of SSD drives in RAID 10 configuration, with logging and WAL on separate drives (since baseline COPY ... FROM STDIN has no performance issue writing –see below– (timing also includes bash sync return time) and CPUs do not generally appear to wait on IO, do not believe disk performance presents the bottlneck)

Postgres configuration:

  • Following is those settings not commented-out (eg. generally non-default settings)
  • Some of these may appear a bit high in attempt to identify configuration change (please feel free to note any "crazy" settings given my system — assuming the server primarily runs as database server)

    max_connections = 20                    # (change requires restart)
    shared_buffers = 32GB                   # min 128kB
    temp_buffers = 128MB                    # min 800kB
    max_prepared_transactions = 20          # Allow 'max_connections'
    work_mem = 16MB                         # min 64kB
    max_stack_depth = 6MB                   # min 100kB
    dynamic_shared_memory_type = posix      # the default is the first option
    max_wal_size = 5GB
    checkpoint_flush_after = 1MB            # measured in pages, 0 disables
    deadlock_timeout = 15s
    max_locks_per_transaction = 2096        # min 10
    
  • Also (temporarily) tried the following in /lib/systemd/system/postgresql.service.d/10-postgresql-unlimited.conf to no effect (actually a small degradation of performance): [WARNING For casual readers, the following settings are NOT recommended for use]

    # DO NOT USE THIS IN PRODUCTION (or elsewise)
    [Service]
    LimitDATA=infinity
    LimitFSIZE=infinity
    LimitLOCKS=infinity
    LimitMEMLOCK=infinity
    LimitMSGQUEUE=infinity
    LimitNPROC=infinity
    LimitNOFILE=infinity
    LimitSIGPENDING=infinity
    

Database profile:

  • Database cluster only has one user, and accessed generally by multi-threaded processes (between 1 and 3 threads used for testing, but up to 16 expected for automated processes and an additional 3-4 connections for administrators).
  • Testing database initialized with table definitions, but no rows. Table definitions include 28 partitioned tables (using Declarative Partitioning) that lack trigger functions and only have limited foreign keys or constraints/ indices.
  • Each partitioned table has 180 partitions and between 42 and 81 columns (of double precision type).

    NB: While these are wider tables than ideal and have more partitions than recommended, I have attempted to reduce the number of columns to 12 and the number of partitions to 30 and see ratable poor INSERT performance.

Code (R):

  • Currently based on PostgreSQL ODBC driver (only one I could get working with multi-line prepared statements)

    library(odbc); 
    cdb <- DBI::dbConnect(drv=odbc::odbc(),driver="PostgreSQL",...);
    on.exit({DBI::dbDisconnect(cdb);}, add=TRUE);
    DBI::dbBegin(cdb);
    sth <- DBI::dbSendStatement(cdb,paste(rep("INSERT INTO <table> (<col_1>,...,<col_n>) VALUES (?,...,?);",100),collapse=""));   # Bind up to 8,000 placeholders at a time (ref in source?) -- use 100 multi-line statements in this example
    DBI::dbBind(sth,bvallist);   # Here, 'bvallist' is a list of values to bind in the multi-line prepared statement -- have tested and checked values in database are correct after INSERT
    num_recs <- DBI::dbGetRowsAffected(sth);
    DBI::dbClearResult(sth);
    DBI::dbCommit(cdb);
    
    • NB Have also used the following with similar performance: (A) prepared statements with only one INSERT (both in odbc and RPostgres drivers) as well as (B) assemble multi-line INSERT SQL string without use of prepared statements.

Performance reports:

Baseline

  • Based on COPY ... FROM STDIN — a simple bash script containing the following command (one for each of the 28 tables): cat <tablename>.out | psql -c 'COPY <tablename> FROM STDIN;'
  • INSERTed 11,117 rows into each of the 28 partitioned tables in 14sec => 22,234 rec/sec (or 4.5e-5 sec/rows)
  • vmstat -wt 1 information for COPY ... FROM STDIN (CPU info based on 16 processors):

    procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu-------- -----timestamp-----
    r  b         swpd         free         buff        cache   si   so    bi    bo   in   cs  us  sy  id  wa  st                 EST
    2  0            0    113563384       200600     15965816    0    0     0   696 4178  1518  14   1  85   0   0 2018-11-27 14:29:19
    2  0            0    113562280       200600     15967184    0    0     0  1483 4781  1727  14   1  84   0   0 2018-11-27 14:29:20
    3  0            0    113564520       200604     15962952    0    0     0  1123 4847  1790  15   2  83   0   0 2018-11-27 14:29:21
    3  0            0    113565664       200604     15962020    0    0     0   800 4512  1646  15   2  84   0   0 2018-11-27 14:29:22
    >>>>>>>> START: COPY ... FROM STDIN bash script (Tue Nov 27 14:29:23 EST 2018)
    3  0            0    113566992       200604     15960036    0    0     0   760 4730  1655  14   2  84   0   0 2018-11-27 14:29:23
    3  1            0    113551160       200604     15961608    0    0     0  9704 5271  5025  15   3  82   1   0 2018-11-27 14:29:24
    3  0            0    113548176       200604     15966864    0    0     0  8764 6269  6335  19   3  77   1   0 2018-11-27 14:29:25
    3  0            0    113552104       200604     15969684    0    0     0  7744 6331  5648  19   3  77   1   0 2018-11-27 14:29:26
    1  0            0    113521248       200604     15975508    0    0     0  8624 3717  5478   9   2  89   0   0 2018-11-27 14:29:27
    2  0            0    113535536       200604     15976840    0    0     0 12563 4982  8784  11   3  86   1   0 2018-11-27 14:29:28
    3  0            0    113535640       200604     15978772    0    0     0 11223 5473  6182  12   3  84   1   0 2018-11-27 14:29:29
    2  0            0    113533576       200604     15977312    0    0     0 11180 5032  6443  12   3  85   0   0 2018-11-27 14:29:30
    2  0            0    113534384       200604     15978180    0    0     0 11169 4961  6511  12   3  86   0   0 2018-11-27 14:29:31
    2  0            0    113504656       200604     16004428    0    0     0 32691 4551 13584  10   3  84   2   0 2018-11-27 14:29:32
    2  0            0    113486672       200604     16023572    0    0     0 26133 4387  8803  10   3  86   2   0 2018-11-27 14:29:33
    4  0            0    113459744       200604     16033296    0    0     0 12535 5709  8188  17   3  80   1   0 2018-11-27 14:29:34
    1  0            0    113444128       200604     16057612    0    0     0 78953 8980 10186  12   4  82   2   0 2018-11-27 14:29:35
    1  1            0    113415520       200604     16087380    0    0     0 23640 5576  8781  15   3  80   1   0 2018-11-27 14:29:36
    3  0            0    113400864       200604     16108080    0    0     0 14733 3348 10182   6   2  90   1   0 2018-11-27 14:29:37
    <<<<<<<< END  : COPY ... FROM STDIN bash script (Tue Nov 27 14:29:37 EST 2018)
    1  0            0    113393552       200632     16127372    0    0     0 56640 4456  4481   4   3  93   1   0 2018-11-27 14:29:38
    0  1            0    113392368       200632     16127680    0    0     0  1688 2446  1406   2   1  96   0   0 2018-11-27 14:29:39
    1  0            0    113391032       200632     16128472    0    0     0  1616 2372  1396   2   1  97   0   0 2018-11-27 14:29:40
    0  0            0    113389376       200632     16128440    0    0     0  1896 2474  1402   2   1  96   0   0 2018-11-27 14:29:41
    

Current

  • When binding a multi-line prepared statement within a single transaction block (see code), it takes 390 sec to INSERT the same rows into each of the 28 partitioned tables for total throughput of 9 rows/ sec (compared to the 22,234 rows/sec in the baseline)
  • The max/avg/min values from vmstat -wt 1 during the INSERT resulting from DBI::dbBind() appear as follows:

    procs -----------------------memory---------------------- ---swap-- -----io---- --system-- --------cpu--------
    r  b         swpd         free         buff        cache   si   so    bi    bo    in    cs  us  sy  id  wa  st     
    6  1            0    112348416       200092     15661572    0    0     0 10090 12259 37796  32   3  81   4   0   # MAX
    3  0            0    111860781       199437     15608955    0    0     0  1708  5658  6371  21   1  78   0   0   # AVG
    3  0            0    111353344       199372     15572860    0    0     0   184  4115  1602  17   0  67   0   0   # MIN
    

Other notes:

  • Performance unaffected when running single or multiple R child threads (ie. one or multiple database client connections).
  • Have tried to use RPostgres, RPostgreSQL, and odbc R packages as database drivers (while slight performance differences exist among the drivers, no implementation-specific artifacts appear to contribute significantly to the processing time).
  • Working on EXPLAIN ANALYZE of SQL, but welcome thoughts ahead of time on potential kernel settings/ database configuration/ SystemD settings/ etc.

Thank you in advance.

Update (2018.11.28)

  • Created an SQL flat file with the INSERT statements produced by the R script (placed the statements within a BEGIN ... COMMIT transaction block) and loaded using psql -f <file>.sql — no performance improvement.
  • Ran EXPLAIN ANALYZE VERBOSE on single INSERT, with following output:

    Insert on <tablename>  (cost=0.00..0.26 rows=1 width=1952) (actual time=0.074..0.074 rows=0 loops=1)
       ->  Result  (cost=0.00..0.26 rows=1 width=1952) (actual time=0.006..0.007 rows=1 loops=1)
             Output: <row data>
    Planning time: 0.303 ms
    Trigger RI_ConstraintTrigger_c_274329 for constraint <foreign_key_constraint_name> on <child_tablename>: time=0.071 calls=1
    Execution time: 13.345 ms
    

Best Answer

A change to the prepared SQL statement resulted in a six-fold improvement.

SQL changed from:

    INSERT INTO <table> (<col_1>,...,<col_n>) VALUES (?,...,?);
    ...
    INSERT INTO <table> (<col_1>,...,<col_n>) VALUES (?,...,?);

to:

    INSERT INTO <table> (<col_1>,...,<col_n>) VALUES
    (?,...,?),
    ...
    (?,...,?);

In other words, I collapsed the multiple INSERT statements into a single statement, which reduced the time from 390 sec to 66 sec, or 56 rows/sec. While still on the slow side, will need to re-profile the execution to see if the remaining latency is application- or database-related.