We're running Postgres 9.4.4 on CentOS 6.5 and have a SELECT query that has worked for years, but stopped working and hangs after we upgraded from 9.2 (it took a while to notice it, so I don't know if it was immediately after we upgraded or not).
SELECT id || ':' || group_number AS uniq_id
FROM table_one
WHERE id || ':' || group_number NOT IN (
SELECT id || ':' || group_number
FROM table_two
)
AND id NOT IN (
SELECT id
FROM table_three
WHERE timestamp > NOW() - INTERVAL '30 days'
AND client_id > 0
);
In all tables id
is an integer, but is stored as character varying (15)
(legacy system). group_number
is stored as a smallint
.
The sub-query for table_two returns about 2.5 million records. The sub-query for table_three
returns about 2,500 records. Both return in about 1 second if run separately. But adding in either query (or both) as sub-queries causes the query to just hang indefinitely (for days, if we let it run).
I've seen others online with the same problem (query not returning when using NOT IN
). NOT IN
seems like such a straight forward sub-query.
We have plenty of hardware (384 GB RAM, Xeon 64 cores, 16 disk 15k RPM RAID 10).
- Why is this happening? (ie. is this a major ongoing bug in Postgres?)
- How can I fix / debug it in the meantime?
Here are the results of EXPLAIN
:
QUERY PLAN
Index Only Scan using table_one_id_pk on table_one (cost=19690.90..64045129699.10 rows=370064 width=9)
Filter: ((NOT (hashed SubPlan 2)) AND (NOT (SubPlan 1)))
SubPlan 2
-> Bitmap Heap Scan on table_three (cost=2446.92..19686.74 rows=8159 width=7)
Recheck Cond: (("timestamp" > (now() - '30 days'::interval)) AND (client_id > 0))
-> BitmapAnd (cost=2446.92..2446.92 rows=8159 width=0)
-> Bitmap Index Scan on table_one_timestamp_idx (cost=0.00..1040.00 rows=79941 width=0)
Index Cond: ("timestamp" > (now() - '30 days'::interval))
-> Bitmap Index Scan on fki_table_three_client_id (cost=0.00..1406.05 rows=107978 width=0)
Index Cond: (client_id > 0)
SubPlan 1
-> Materialize (cost=0.00..84813.75 rows=3436959 width=9)
-> Seq Scan on table_two (cost=0.00..64593.79 rows=3436959 width=9)
My settings from postgresql.conf
max_connections = 200
shared_buffers = 24GB
temp_buffers = 8MB
work_mem = 96MB
maintenance_work_mem = 1GB
cpu_tuple_cost = 0.0030
cpu_index_tuple_cost = 0.0010
cpu_operator_cost = 0.0005
effective_cache_size = 128GB
from_collapse_limit = 4
join_collapse_limit = 4
UPDATE
I used the following method to adjust the work_mem
just for this query:
BEGIN;
SET work_mem = '256MB';
-- query --
SET work_mem = default;
COMMIT;
Using NOT IN
returned in 5 – 8 seconds (vs never with work_mem = 96MB
).
Using LEFT JOIN
returned in 13 – 14 seconds (vs 24 seconds with work_mem = 96MB
).
So it looks like the problem was with work_mem
, and using LEFT JOIN
was just a workaround. However, the real problem is Postgres going for days with work_mem = 96MB
.
With 16 x 15k SAS drives in RAID 10 we have very fast I/O, so even going to disc the query should have returned, just a bit slower.
UPDATE 2
Here are the results for EXPLAIN ANALYZE on the LEFT JOIN approach:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Anti Join (cost=27318.56..351160.97 rows=728325 width=9) (actual time=9553.378..21247.202 rows=7 loops=1)
-> Hash Anti Join (cost=27318.47..176945.69 rows=1501249 width=9) (actual time=511.578..5479.549 rows=1478438 loops=1)
Hash Cond: ((t1.id)::text = (t3.id)::text)
-> Seq Scan on table_one t1 (cost=0.00..143842.21 rows=1593403 width=9) (actual time=0.026..4369.804 rows=1485291 loops=1)
-> Hash (cost=27289.76..27289.76 rows=8203 width=7) (actual time=511.518..511.518 rows=1286 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 51kB
-> Bitmap Heap Scan on table_three t3 (cost=1518.79..27289.76 rows=8203 width=7) (actual time=125.379..510.998 rows=1286 loops=1)
Recheck Cond: (client_id > 0)
Filter: ("timestamp" > (now() - '30 days'::interval))
Rows Removed by Filter: 104626
Heap Blocks: exact=16093
-> Bitmap Index Scan on fki_table_three_client_id (cost=0.00..1518.38 rows=108195 width=0) (actual time=121.633..121.633 rows=122976 loops=1)
Index Cond: (client_id > 0)
-> Index Only Scan using t_table_two_id2_idx on table_two t2 (cost=0.09..0.14 rows=1 width=9) (actual time=0.010..0.010 rows=1 loops=1478438)
Index Cond: ((id = (t1.id)::text) AND (group_number = t1.group_number))
Heap Fetches: 143348
Planning time: 30.527 ms
Execution time: 21247.541 ms
(18 rows)
Time: 23697.256 ms
And here they are for the NOT EXISTS approach:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Anti Join (cost=27318.56..351160.97 rows=728325 width=9) (actual time=5117.110..14061.838 rows=7 loops=1)
-> Hash Anti Join (cost=27318.47..176945.69 rows=1501249 width=9) (actual time=146.779..1254.400 rows=1478439 loops=1)
Hash Cond: ((t1.id)::text = (t3.id)::text)
-> Seq Scan on table_one t1 (cost=0.00..143842.21 rows=1593403 width=9) (actual time=0.007..591.383 rows=1485291 loops=1)
-> Hash (cost=27289.76..27289.76 rows=8203 width=7) (actual time=146.758..146.758 rows=1285 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 51kB
-> Bitmap Heap Scan on table_three t3 (cost=1518.79..27289.76 rows=8203 width=7) (actual time=17.586..146.330 rows=1285 loops=1)
Recheck Cond: (client_id > 0)
Filter: ("timestamp" > (now() - '30 days'::interval))
Rows Removed by Filter: 104627
Heap Blocks: exact=16093
-> Bitmap Index Scan on fki_table_one_client_id (cost=0.00..1518.38 rows=108195 width=0) (actual time=14.415..14.415 rows=122976 loops=1)
Index Cond: (client_id > 0)
-> Index Only Scan using t_table_two_id2_idx on table_two t2 (cost=0.09..0.14 rows=1 width=9) (actual time=0.008..0.008 rows=1 loops=1478439)
Index Cond: ((id = (t1.id)::text) AND (group_number = t1.group_number))
Heap Fetches: 143348
Planning time: 2.155 ms
Execution time: 14062.014 ms
(18 rows)
Time: 14065.573 ms
Best Answer
Assuming you checked off usual suspects in the wiki page as commented by @a_horse.
Also see the spin-off addressing the discussion of bitmap index scans and the size of
work_mem
.Query
This rewritten query should be substantially faster:
The most important problem is comparing a concatenated string between
table_one
andtable_two
, which is generally more expensive than necessary and specifically not sargable.Storing integer numbers as strings is expensive nonsense. You seem to be aware of that. Convert to
integer
if at all possible. If you only have valid integer numbers in the varchar columnid
, all you need to do is:And possibly the same for
table_two
.NOT IN
carries a trap for NULL values on either side. That's whyNOT EXISTS
is almost always better. (Typically performs better on top of that.)Indexes
Either way, key to performance are matching indexes.
Be sure to have multicolumn index on
table_one
andtable_two
:Possibly allows index-only scans.
With
integer
instead ofvarchar
, these would be smaller and more efficient, yet:And I suggest a partial multicolumn index on
table_three
:Usefulness deteriorates over time. Recreate the index with increased lower bound at opportune moments - which takes an exclusive lock on the table, so consider
CREATE INDEX CONCURRENTLY
. Detailed explanation:You need to match the (updated) index condition in your queries. Add the condition even if that seems redundant. Like:
You can use a function as pseudo-constant in partial index and query and automate the process. Last chapter in this related answer:
SET LOCAL
Like you found yourself, increasing
work_mem
locally for the query helps if the query needs that much RAM. ConsiderSET LOCAL
:With all suggested improvements in place you may not need to increase
work_mem
any more.