PostgreSQL Performance – Not Using All Available IOPS on Single Operation

Using PostgreSQL 9.3 on Windows 7×64, CoreI7, SSD.

Assume a PostgreSQL table with 8 Int32 columns and following C# code to fill the table.

using (var connection = new NpgsqlConnection("User Id=postgres;Password=**;host=localhost;database=**;"))
using (var command = new NpgsqlCommand())
{
    connection.Open();
    command.Connection = connection;
    command.CommandText = "COPY \"MyTable\" FROM STDIN;";
    var copyInSerializer = new NpgsqlCopySerializer(connection);
    var copyIn = new NpgsqlCopyIn(command, connection, copyInSerializer.ToStream);
    copyIn.Start();

    for (int i = start; i < start + 1000000; i++)
    {
        copyInSerializer.AddInt32(i);
        //add 7 other int32 values
        copyInSerializer.EndRow();
    }
    copyInSerializer.Flush();
    copyIn.End();

    connection.Close();
}

On my test hardware this took ~10 seconds execution time and measured disk usage of ~20MB/s.
CPU and RAM do NOT hit limits.

The given SSD supports ~60MB/s writing random 4k blocks so I wondered why it would not execute faster. I created a second, identical table in the same database and disk and ran the above code sample in parallel (two tasks) on both tables.
Both took ~10 seconds, and Measured Disk usage was ~40MB/s.

Why does PostgreSQL not focus all available resources on the one and only operation happening?
Is there a way to force PostgreSQL to use the full available disk bandwidth?

Best Answer

ADO.NET (used in the code sample) obviously communicates via network - even if a local Postgres instance is used. Using STDIN therefore results in quite a bit overhead. Before Postgres is able to insert the bulk data, it needs to be:

serialized by C# client
transmitted via network connection (in my case to localhost)
deserialized by Postgres

This basically explains/justifies the described behaviour.

If the C# Client and Postgres are run on the same computer, you can speed up bulk inseration by writing a csv file on a local ssd drive (even better a ram drive) and execute the SQL statement 'Copy MyTable From x:\test.csv'.

Best Answer

Related Solutions

Postgresql – Why is postgresql still consuming space on the other drive

PostgreSQL – Can a Single Query Use Multiple Cores?

Related Question