Postgresql – Postgres query very slow when using index

indexperformancepostgresqlquery-performance

So I've run into this issue with a slow Postgres query. I've added an index but it doesn't seem to help. It takes 8 minutes to get 100 records from a table with 2822968 records.

Here's the query that I run:

SELECT  "organisations"."id", "organisations"."url"
FROM "organisations" 
WHERE "organisations"."url" IS NOT NULL 
AND "organisations"."domain" IS NULL 
ORDER BY "organisations"."id" ASC
LIMIT 100

And here is the explain analyze

Limit  (cost=17468.48..17468.73 rows=100 width=38) (actual time=517448.061..517449.431 rows=100 loops=1)
  ->  Sort  (cost=17468.48..17482.23 rows=5501 width=38) (actual time=517448.057..517448.791 rows=100 loops=1)
        Sort Key: id
        Sort Method: top-N heapsort  Memory: 38kB
        ->  Bitmap Heap Scan on organisations  (cost=193.31..17258.23 rows=5501 width=38) (actual time=93.370..516988.919 rows=1096205 loops=1)
              Recheck Cond: ((url IS NOT NULL) AND (domain IS NULL))
              Heap Blocks: exact=59205
              ->  Bitmap Index Scan on index_organisations_on_url_and_domain  (cost=0.00..191.93 rows=5501 width=0) (actual time=82.242..82.243 rows=1098765 loops=1)
Planning Time: 0.202 ms
Execution Time: 517453.118 ms

I use rails – here is the partial multi column index from the schema file:

t.index ["url", "domain"], name: "index_organisations_on_url_and_domain", where: "((url IS NOT NULL) AND (domain IS NULL))"

If it helps this is running on Postgres RDS on a small instance. There are no other connections running as I turned off all background jobs, and there are no users connected using the app at the time of running.

I'm not sure how I can get this to return faster.

Best Answer

It seems that the statistics for that index are way off, and PostgreSQL may be faster performing a sequential scan.

Try to

ANALYZE organizations;

and see if that improves matters.

Another idea may be an index on organizations.id (if you don't already have that) with the idea to fetch rows in sort order and discard the ones that don't meet the criterion.

The perfect index would be

CREATE INDEX ON organizations (id)
   WHERE url IS NOT NULL AND domain IS NULL;

You could also include the url column in the index (using the INCLUDE clause in v11) and aim for an index only scan.