Postgresql – Ways to speed up IN queries under PostgreSQL

performancepostgresql

I have a query of the following form:

SELECT * FROM twitter_personas WHERE twitter_user_id IN ($1, $2, $3, ..., $25000)

The IN query has anywhere from 10 to 25000 values. The query runs for minutes at a time. I have a backlog of nearly 500,000 queries like this to run.

The twitter_user_id column is indexed. Any ideas on how I could speed this up?

# \d twitter_personas
                                    Table "public.twitter_personas"
      Column      |          Type          |                         Modifiers                          
------------------+------------------------+------------------------------------------------------------
 persona_id       | uuid                   | not null
 twitter_user_id  | bigint                 | 
 screen_name      | character varying(40)  | not null
 avatar_url       | text                   | 
 hashval          | integer                | not null default nextval('personas_hashval_seq'::regclass)
Indexes:
    "twitter_personas_pkey" PRIMARY KEY, btree (persona_id)
    "index_twitter_personas_on_screen_name" UNIQUE, btree (screen_name)
    "index_twitter_personas_on_screen_name_persona_id" btree (screen_name, persona_id)
    "index_twitter_personas_twitter_user_id" btree (twitter_user_id) WHERE twitter_user_id IS NOT NULL

Best Answer

IN() using many parameters will result in many cases in a sequential table scan. That might be slow, depending on table size and speed of your system.

Create a temporary table with all your variables and join on this table:

CREATE TEMP TABLE t AS 
  SELECT * FROM (VALUES(1),(2),(3)) x(twitter_user_id);

SELECT 
  twitter_personas.* 
FROM twitter_personas 
  JOIN t USING(twitter_user_id);

Use EXPLAIN to see the difference between the queryplans.