PostgreSQL 9.6 – Optimizing Slow WHERE IN Query on Large Tables

postgresql

My table (with 140 000 000 rows) :

CREATE TABLE attendees(
  attendee_id                 VARCHAR NOT NULL,
  name                        VARCHAR NOT NULL,
  status                      VARCHAR NOT NULL,
  event_id                    VARCHAR NOT NULL,
  PRIMARY KEY (attendee_id, event_id)
);

CREATE INDEX attendees_event_id ON attendees(event_id);
CREATE INDEX attendees_attendee_id ON attendees(attendee_id);

My query :

SELECT event_id FROM attendees 
where attendee_id IN (SELECT attendee_id FROM attendees where event_id=$eventId)

This query can take several minutes.
I saw here: Optimizing a Postgres query with a large IN I can change the IN by IN (VALUES (..)...) but I don't know how to change this sub query :

SELECT attendee_id FROM attendees where event_id=$eventId

to have (VALUES (..)...)

So, my question is, what is the best way to make this query faster ?

edit (I run the explain (analyze, verbose) on smaller table):

Hash Join  (cost=5296.72..14097.38 rows=209116 width=16) (actual time=177.516..362.851 rows=121564 loops=1)
  Output: attendees.event_id
  Hash Cond: ((attendees.attendee_id)::text = (attendees_1.attendee_id)::text)
  ->  Seq Scan on public.attendees  (cost=0.00..5051.21 rows=230721 width=33) (actual time=0.572..90.302 rows=230721 loops=1)
        Output: attendees.event_id, attendees.attendee_id
  ->  Hash  (cost=4664.54..4664.54 rows=50574 width=17) (actual time=176.611..176.611 rows=49650 loops=1)
        Output: attendees_1.attendee_id
        Buckets: 65536  Batches: 1  Memory Usage: 2917kB
        ->  Bitmap Heap Scan on public.attendees attendees_1  (cost=1288.37..4664.54 rows=50574 width=17) (actual time=14.141..151.031 rows=49650 loops=1)
              Output: attendees_1.attendee_id
              Recheck Cond: ((attendees_1.event_id)::text = '1193751294041282'::text)
              Heap Blocks: exact=595
              ->  Bitmap Index Scan on attendees_event_id  (cost=0.00..1275.73 rows=50574 width=0) (actual time=13.265..13.265 rows=49650 loops=1)
                    Index Cond: ((attendees_1.event_id)::text = '1193751294041282'::text)
Planning time: 1.611 ms
Execution time: 371.431 ms

Best Answer

Drop the extra indexes
Reverse the order of the PK to PRIMARY KEY (event_id, attendee_id)

SELECT a1.event_id 
FROM attendees a1
JOIN attendees a2
  ON a2.event_id    = $eventId
 and a2.attendee_id = a1.attendee_id