PostgreSQL external sort performance

postgresqlsorting

I'm trying to figure out external sort performance and it's dependence on WORK_MEM value.
I've tested that increasing WORK_MEM doesn't always speed up external sorting (example below, tested on PostgreSQL 9.6).
Is there any guidance on setting WORK_MEM, say what value of WORK_MEM is optimal to sort 4GB table using external sort?
Is it possible to estimate number of IOs from table size and WORK_MEM setting?

I understand that i can study tuplesort.c and underlying algorythms, is there ready results?

create table t1
as
select
  c as key
, cast('value' as char(5)) as val
from generate_series(1, 100000000) c;

--4223 MB
select pg_size_pretty(pg_total_relation_size('t1'));


\timing

--time - 146725.833 ms
SET WORK_MEM = '128MB';
create table t_128 as select * from t1 order by key;

--time - 148889.655 ms
SET WORK_MEM = '1024MB';
create table t_1024 as select * from t1 order by key;

Best Answer

Yes. You can see the output of external sorts when doing EXPLAIN ANALYZE (and adding in BUFFERS and others according to the EXPLAIN documentation, so you don't really have to estimate if you can run these. Using your examples:

postgres@[local]:5432:postgres:=# create table t1
postgres-# as
postgres-# select
postgres-#   c as key
postgres-# , cast('value' as char(5)) as val
postgres-# from generate_series(1, 100000000) c;
SELECT 100000000
Time: 162609.861 ms
postgres@[local]:5432:postgres:=# SET work_mem="128MB";
SET
Time: 0.317 ms
postgres@[local]:5432:postgres:=# EXPLAIN (ANALYZE, BUFFERS) create table t_128 as select * from t1 order by key;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=13476557.48..13672503.59 rows=78378445 width=28) (actual time=38358.696..66282.785 rows=100000000 loops=1)
   Sort Key: key
   Sort Method: external merge  Disk: 1954200kB
   Buffers: shared hit=2083 read=538461 dirtied=508619, temp read=244397 written=244397
   ->  Seq Scan on t1  (cost=0.00..1324325.45 rows=78378445 width=28) (actual time=0.108..17185.459 rows=100000000 loops=1)
         Buffers: shared hit=2080 read=538461 dirtied=508619
 Planning time: 74.660 ms
 Execution time: 206961.514 ms
(8 rows)

Time: 207239.156 ms
postgres@[local]:5432:postgres:=# SET WORK_MEM = '1024MB';
SET
Time: 0.192 ms
postgres@[local]:5432:postgres:=# EXPLAIN (ANALYZE, BUFFERS) create table t_1024 as select * from t1 order by key;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=16537253.59..16787253.81 rows=100000088 width=10) (actual time=30379.785..51937.967 rows=100000000 loops=1)
   Sort Key: key
   Sort Method: external merge  Disk: 1954944kB
   Buffers: shared hit=540541, temp read=244382 written=244382
   ->  Seq Scan on t1  (cost=0.00..1540541.88 rows=100000088 width=10) (actual time=0.013..7570.784 rows=100000000 loops=1)
         Buffers: shared hit=540541
 Planning time: 0.121 ms
 Execution time: 172406.698 ms
(8 rows)

Time: 172527.526 ms
postgres@[local]:5432:postgres:=# 

We can see that the sorts going on are external (on-disk), and they take up roughly 2GB. Now disk is much, much slower than memory, so there might be more of a speedup if we get it doing the whole sort in memory.

Now, if we go a really ludicrous route, we can change the sort characteristics, like this:

postgres@[local]:5432:postgres:=# SET work_mem='20GB';
SET
Time: 0.188 ms
postgres@[local]:5432:postgres:=# EXPLAIN (ANALYZE, BUFFERS) create table t_20g as select * from t1 order by key;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=14828266.59..15078266.81 rows=100000088 width=10) (actual time=20532.705..31367.756 rows=100000000 loops=1)
   Sort Key: key
   Sort Method: quicksort  Memory: 7833229kB
   Buffers: shared hit=540541
   ->  Seq Scan on t1  (cost=0.00..1540541.88 rows=100000088 width=10) (actual time=0.013..7758.434 rows=100000000 loops=1)
         Buffers: shared hit=540541
 Planning time: 0.044 ms
 Execution time: 139461.981 ms
(8 rows)

Time: 139503.860 ms
postgres@[local]:5432:postgres:=# 

Which shaves 67499.533ms (or about 67.5 seconds) off the total update from the 128MB version of the sort mem, for a difference of about 32% or so. This changed from the external merge (intermediate results sorted on disk), to quicksort in memory.

The IO that is being done to execute this query is shown by adding the BUFFERS parameter to EXPLAIN, which gives you hit (read from shared block cache), read (read from disk), dirtied (changed), and written (written to disk).

The optimal work_mem depends on a lot of factors. It's easiest to log_temp_files and use that as a starting point for adjusting the work_mem per query, and set it to something like 2-4x the size of your largest temp file. Though the setting is applied per each query (to sorts and hash tables as well), so don't make it too large unless you have to. Setting it per query on something really large like this is probably more appropriate.

tuplesort.c does actually go into detail about how it's sorted, and if you want to delve deeper than what is shown in EXPLAIN ANALYZE, then you can look at things with dtrace/systemtap/eBPF or other dynamic tracing facilities that would show you more in-depth how PostgreSQL is sorting. Or you could add a bunch of ereport lines in a custom compiled version depending on what you're interested in.

You can also get a lot more detail on work_mem effects in this post Understanding postgresql.conf : work_mem, if you're interested in additional tests and their results.