Postgresql – PosgreSQL: setting high work_mem does not avoid disk merge

postgresql

This is not quite my day with postgres. On my server machine with PosgreSQL 9.2.3 I have set work_mem to 4MB to avoid Sort Method: external merge Disk: 2072kB but it did not help:

cwu=# vacuum analyze web_city;
VACUUM
cwu=# SHOW work_mem;
 work_mem 
----------
 4MB
(1 row)
cwu=# explain analyze select count(*) from web_city GROUP BY (left(name,5));
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=18304.35..20487.34 rows=95562 width=10) (actual time=1557.871..1809.029 rows=64459 loops=1)
   ->  Sort  (cost=18304.35..18633.84 rows=131796 width=10) (actual time=1557.856..1707.069 rows=131796 loops=1)
         Sort Key: ("left"((name)::text, 5))
         Sort Method: external merge  Disk: 2072kB
         ->  Seq Scan on web_city  (cost=0.00..4842.45 rows=131796 width=10) (actual time=1.050..174.907 rows=131796 loops=1)
 Total runtime: 1828.936 ms
(6 rows)

Setting work_mem to 8MB finally helps:

cwu=# SET work_mem = '8MB';
SET
cwu=# explain analyze select count(*) from web_city GROUP BY (left(name,5));
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=5501.43..6675.72 rows=93943 width=10) (actual time=207.628..244.667 rows=64459 loops=1)
   ->  Seq Scan on web_city  (cost=0.00..4842.45 rows=131796 width=10) (actual time=0.749..102.511 rows=131796 loops=1)
 Total runtime: 263.154 ms
(3 rows)

But why 4MB is not enough? In postgres wiki, there is this note:

if you see a line like "Sort Method: external merge Disk: 7526kB" in
there, you'd know a work_mem of at least 8MB would really improve how
fast that query executed, by sorting in RAM instead of swapping to
disk.

So I assumed it will be the same in my case.

EDIT: If I do:

cwu=# create index name_left_prefix on web_city(left(name, 5));

then 4MB is finally enough. It seems that the index causes lower memory usage. If anyone would be that kind to explain all this behaviour I would be very grateful.

Best Answer

This is somewhat speculative but Depesz (Hubert Lubaczewski) has this to say on the subject:

You might wonder, though, why PostgreSQL switched to Disk, when it used only 448kB? After all, work_mem is 1MB. Answer is pretty simple – as I understand – disk is used when work_mem is not enough, so it means it's already been filled. So, sort with “Disk: 448kB" would mean that more or less whole work_mem has been used plus 448kB of disk.

So in your case the used work_mem might be in the 6 MB range. Also, try reset work_mem first, maybe there's stuff in there from a previous query.