I have a query of the following form:
SELECT * FROM twitter_personas WHERE twitter_user_id IN ($1, $2, $3, ..., $25000)
The IN query has anywhere from 10 to 25000 values. The query runs for minutes at a time. I have a backlog of nearly 500,000 queries like this to run.
The twitter_user_id column is indexed. Any ideas on how I could speed this up?
# \d twitter_personas
Table "public.twitter_personas"
Column | Type | Modifiers
------------------+------------------------+------------------------------------------------------------
persona_id | uuid | not null
twitter_user_id | bigint |
screen_name | character varying(40) | not null
avatar_url | text |
hashval | integer | not null default nextval('personas_hashval_seq'::regclass)
Indexes:
"twitter_personas_pkey" PRIMARY KEY, btree (persona_id)
"index_twitter_personas_on_screen_name" UNIQUE, btree (screen_name)
"index_twitter_personas_on_screen_name_persona_id" btree (screen_name, persona_id)
"index_twitter_personas_twitter_user_id" btree (twitter_user_id) WHERE twitter_user_id IS NOT NULL
Best Answer
IN() using many parameters will result in many cases in a sequential table scan. That might be slow, depending on table size and speed of your system.
Create a temporary table with all your variables and join on this table:
Use EXPLAIN to see the difference between the queryplans.