My table (with 140 000 000 rows) :
CREATE TABLE attendees(
attendee_id VARCHAR NOT NULL,
name VARCHAR NOT NULL,
status VARCHAR NOT NULL,
event_id VARCHAR NOT NULL,
PRIMARY KEY (attendee_id, event_id)
);
CREATE INDEX attendees_event_id ON attendees(event_id);
CREATE INDEX attendees_attendee_id ON attendees(attendee_id);
My query :
SELECT event_id FROM attendees
where attendee_id IN (SELECT attendee_id FROM attendees where event_id=$eventId)
This query can take several minutes.
I saw here: Optimizing a Postgres query with a large IN I can change the IN
by IN (VALUES (..)...)
but I don't know how to change this sub query :
SELECT attendee_id FROM attendees where event_id=$eventId
to have (VALUES (..)...)
So, my question is, what is the best way to make this query faster ?
edit (I run the explain (analyze, verbose) on smaller table):
Hash Join (cost=5296.72..14097.38 rows=209116 width=16) (actual time=177.516..362.851 rows=121564 loops=1)
Output: attendees.event_id
Hash Cond: ((attendees.attendee_id)::text = (attendees_1.attendee_id)::text)
-> Seq Scan on public.attendees (cost=0.00..5051.21 rows=230721 width=33) (actual time=0.572..90.302 rows=230721 loops=1)
Output: attendees.event_id, attendees.attendee_id
-> Hash (cost=4664.54..4664.54 rows=50574 width=17) (actual time=176.611..176.611 rows=49650 loops=1)
Output: attendees_1.attendee_id
Buckets: 65536 Batches: 1 Memory Usage: 2917kB
-> Bitmap Heap Scan on public.attendees attendees_1 (cost=1288.37..4664.54 rows=50574 width=17) (actual time=14.141..151.031 rows=49650 loops=1)
Output: attendees_1.attendee_id
Recheck Cond: ((attendees_1.event_id)::text = '1193751294041282'::text)
Heap Blocks: exact=595
-> Bitmap Index Scan on attendees_event_id (cost=0.00..1275.73 rows=50574 width=0) (actual time=13.265..13.265 rows=49650 loops=1)
Index Cond: ((attendees_1.event_id)::text = '1193751294041282'::text)
Planning time: 1.611 ms
Execution time: 371.431 ms
Best Answer
Drop the extra indexes
Reverse the order of the PK to PRIMARY KEY (event_id, attendee_id)