Postgresql – WHERE IN query to very large table is slow

performancepostgresqlqueryquery-performance

I currently have a table with around 300,000,000 rows with the following columns: user_id, item_id, item_value, and item_add_date

I also have a very long list of about 7000 user_id values which are associated with a date range.

I'm trying to query the really large table in order to obtain all rows where the user_id is in my list and where the item_add_date falls within the provided range.

My current approach is programatically looping through the list of ids and running the following query:

FROM all_items 
WHERE user_id in [insert_user_id_programatically] 
AND item_add_date >= [insert_start_date] 
AND item_add_date <= [insert_end_date]

which returns the results I want but takes about one hour per query, and for 7000 queries would take almost a year to run. Is there a faster way of achieving this? I'm not familiar with SQL and do most of the data manipulation for this project in python normally.

EDIT:
Here is the output for EXPLAIN (ANALYZE, BUFFERS, VERBOSE):
https://pastebin.com/GYL8JWk4

Best Answer

Sometimes large in clauses can be improved by rewriting them as a join against a values clause.

so instead of

select *
from all_items
where user_id in (1,2,3,4,5)
 ...

you can use

select ai.*
from all_items ai
   join ( 
       values (1),(2),(3),(4),(5)
   ) t(id) on t.id = ai.user_id
where item_add_date >= ...
  and item_add_date <= ...;

An index on (user_id, item_add_date) should improve the performance