I'm trying to figure out external sort performance and it's dependence on WORK_MEM value.
I've tested that increasing WORK_MEM doesn't always speed up external sorting (example below, tested on PostgreSQL 9.6).
Is there any guidance on setting WORK_MEM, say what value of WORK_MEM is optimal to sort 4GB table using external sort?
Is it possible to estimate number of IOs from table size and WORK_MEM setting?
I understand that i can study tuplesort.c and underlying algorythms, is there ready results?
create table t1
as
select
c as key
, cast('value' as char(5)) as val
from generate_series(1, 100000000) c;
--4223 MB
select pg_size_pretty(pg_total_relation_size('t1'));
\timing
--time - 146725.833 ms
SET WORK_MEM = '128MB';
create table t_128 as select * from t1 order by key;
--time - 148889.655 ms
SET WORK_MEM = '1024MB';
create table t_1024 as select * from t1 order by key;
Best Answer
Yes. You can see the output of external sorts when doing
EXPLAIN ANALYZE
(and adding in BUFFERS and others according to the EXPLAIN documentation, so you don't really have to estimate if you can run these. Using your examples:We can see that the sorts going on are external (on-disk), and they take up roughly 2GB. Now disk is much, much slower than memory, so there might be more of a speedup if we get it doing the whole sort in memory.
Now, if we go a really ludicrous route, we can change the sort characteristics, like this:
Which shaves 67499.533ms (or about 67.5 seconds) off the total update from the 128MB version of the sort mem, for a difference of about 32% or so. This changed from the external merge (intermediate results sorted on disk), to quicksort in memory.
The IO that is being done to execute this query is shown by adding the
BUFFERS
parameter toEXPLAIN
, which gives you hit (read from shared block cache), read (read from disk), dirtied (changed), and written (written to disk).The optimal
work_mem
depends on a lot of factors. It's easiest tolog_temp_files
and use that as a starting point for adjusting thework_mem
per query, and set it to something like 2-4x the size of your largest temp file. Though the setting is applied per each query (to sorts and hash tables as well), so don't make it too large unless you have to. Setting it per query on something really large like this is probably more appropriate.tuplesort.c
does actually go into detail about how it's sorted, and if you want to delve deeper than what is shown inEXPLAIN ANALYZE
, then you can look at things with dtrace/systemtap/eBPF or other dynamic tracing facilities that would show you more in-depth how PostgreSQL is sorting. Or you could add a bunch of ereport lines in a custom compiled version depending on what you're interested in.You can also get a lot more detail on
work_mem
effects in this post Understanding postgresql.conf : work_mem, if you're interested in additional tests and their results.