Postgresql – How to speed up a Postgres query containing lots of Joins with an ILIKE condition

optimizationperformancepostgresqlquery-performance

I have an odd problem which I really don't understand. Simply put I have a join containing 4 table joins, which I believe all have appropriate indexes, but which takes a ridiculous amount of time to query, unless I remove bits of it.

The greater picture is that there are 3 types of object A, B and C each has it's own table, and is related such that A is the child, B the parent and C the grandparent. In addition to that there is a relationship table R, allowing multiple B's to relate to multiple C's and as relationship R's are of a particular type, an additional table T.

Now in the query in question I'm trying to get a list of records of type A, who's parent has a particular type of relationship with grandparents having a name ILIKE another string.

Table A has ~700k records, table B has ~60k records, table C has ~8k records table R had ~90k records and table T has ~100 records.

As A contains field parent_id which links to field B.id, B needn't be directly included in the query.

So the query is something like:

SELECT DISTINCT A.id, A.name
FROM A
JOIN R ON A.parent_id=R.lhs
JOIN T ON R.type=T.id AND T.alias='type-name'
JOIN C ON R.rhs=C.id
WHERE A.flag=1 AND A.strvalue='value' AND C.name ILIKE '%substr%'
ORDER BY A.name ASC
LIMIT 25;

It takes more than 10 seconds to run the query like that (I've never let it run to completion as it takes far too long).

In my actual set-up, I've got types in the key ID fields, so the query actually hangs off of a field in the type, but then the indexes do that too.

The strange thing is I've tried taking bits out of the query so try to identify the bit which is taking too long, and removing the T part or the ILIKE part seems to make it perform in a normal timescale.

This is the result of running EXAMINE on my original query (I've replaced the name fields to suit the example above, but left in the type references, so hopefully it'll make sense, apologies for any typos)

 Limit  (cost=40296.05..40296.10 rows=5 width=358)
   ->  Unique  (cost=40296.05..40296.10 rows=5 width=358)
         ->  Sort  (cost=40296.05..40296.06 rows=5 width=358)
               Sort Key: a.name, t.name, a.id
               ->  Nested Loop  (cost=277.06..40295.99 rows=5 width=358)
                     Join Filter: ((r.lhs).id = (a.parent).id)
                     ->  Nested Loop  (cost=277.06..3279.49 rows=1 width=365)
                           Join Filter: ((r.rhs).id = c.id)
                           ->  Seq Scan on c  (cost=0.00..113.53 rows=1 width=4)
                                 Filter: (name ~~* '%substr%'::text)
                           ->  Nested Loop  (cost=277.06..3150.35 rows=1249 width=456)
                                 ->  Index Scan using t_alias on t  (cost=0.00..8.27 rows=1 width=278)
                                       Index Cond: ((alias)::text = 'type-name'::text)
                                 ->  Bitmap Heap Scan on r  (cost=277.06..2985.96 rows=12490 width=186)
                                       Recheck Cond: (type = t.id)
                                       ->  Bitmap Index Scan on to_related_by  (cost=0.00..273.94 rows=12490 width=0)
                                             Index Cond: (type = t.id)
                     ->  Seq Scan on a  (cost=0.00..36973.25 rows=3460 width=194)
                           Filter: (((flag IS NULL) OR (NOT flag)) AND (((strvalue).id)::text = 'value'::text))
(19 rows)

This might be easier if I create a bunch of simplified example tables and put loads of real data in there, but I'm hoping someone can point to my silly mistake without me needing to do that.

Here's the result of explain analyse (I left it to run so you can see how long it actually takes):-

 Limit  (cost=40344.50..40344.51 rows=1 width=358) (actual time=2478498.373..2478498.414 rows=7 loops=1)
   ->  Unique  (cost=40344.50..40344.51 rows=1 width=358) (actual time=2478498.369..2478498.394 rows=7 loops=1)
         ->  Sort  (cost=40344.50..40344.51 rows=1 width=358) (actual time=2478498.366..2478498.374 rows=7 loops=1)
               Sort Key: a.name, t.name, a.id
               Sort Method: quicksort  Memory: 25kB
               ->  Nested Loop  (cost=274.74..40344.49 rows=1 width=358) (actual time=1492181.642..2478498.230 rows=7 loops=1)
                     Join Filter: ((r.rhs).id = (a.parent).id)
                     ->  Nested Loop  (cost=274.74..3327.99 rows=1 width=365) (actual time=96.682..1445.683 rows=4409 loops=1)
                           Join Filter: ((r.rhs).id = c.id)
                           ->  Seq Scan on c (cost=0.00..113.53 rows=1 width=4) (actual time=0.143..10.569 rows=6 loops=1)
                                 Filter: (name ~~* '%substr%'::text)
                           ->  Nested Loop  (cost=274.74..3210.48 rows=319 width=456) (actual time=3.961..185.316 rows=32112 loops=6)
                                 ->  Index Scan using t_alias on t  (cost=0.00..8.27 rows=1 width=278) (actual time=0.020..0.025 rows=1 loops=6)
                                       Index Cond: ((alias)::text = 'type-name'::text)
                                 ->  Bitmap Heap Scan on r  (cost=274.74..3046.09 rows=12490 width=186) (actual time=3.927..93.154 rows=32112 loops=6)
                                       Recheck Cond: (type = id)
                                       Filter: ((lhs_app = 'collectible'::application) AND (rhs_app = 'realthing'::application))
                                       ->  Bitmap Index Scan on type  (cost=0.00..273.94 rows=12490 width=0) (actual time=3.527..3.527 rows=32112 loops=6)
                                             Index Cond: (type = r.id)
                     ->  Seq Scan on a  (cost=0.00..36973.25 rows=3460 width=194) (actual time=220.577..561.750 rows=21 loops=4409)
                           Filter: (((flag IS NULL) OR (NOT flag)) AND (((strvalue).id)::text = 'value'::text))
 Total runtime: 2478498.516 ms
(22 rows)

As I thought it might help, this is a modified version of the \d for A:-

collector=> \d a
Table "public.a"
        Column          |            Type             |                            Modifiers
------------------------+-----------------------------+-----------------------------------------------------------------
    idi                 | integer                     | not null default nextval('idi_seq'::regclass)
    id                  | character(24)               | not null
    parent              | mg_item                     |
    strvalue            | mg_item                     |
    name                | text                        |
    flag                | boolean                     |

Indexes:
    "a_pkey" PRIMARY KEY, btree (idi)
    "a_id_key" UNIQUE CONSTRAINT, btree (id)
    "a_parent" btree (parent)
    "a_flag" btree (flag)
    "a_name" btree (name)
    "a_strvalue" btree (strvalue)

Best Answer

I see a couple of issues.

The biggest one is that PG is using a sequence scan on A when filtering A. I think you need a composite index on A.flag AND A.strvalue. If there is already an index available, PostgreSQL is choosing not to use it for some reason. This seems to be eating up 92% of your cost estimate and is likely what's making it run for so long.

As for the ILIKE, PostgreSQL cannot natively (but see below for a module that can) use an index as long as your wildcard is the first character. That's simply a restriction on the ILIKE operator. For that reason you are getting a sequence scan which means every single row is being loaded and the C.name column is being scanned for characters. But one thing that's weird is that the ILIKE sequence scan doesn't seem to be eating up much of the cost estimate in this query plan. Anyway, if it is the ILIKE operator causing the slowdown, I would consider rewriting your query so that it somehow looks like this: ILIKE 'value%' or else consider using PostgreSQL's full text search.

UPDATED

The ILIKE operator can use a trigram index. Superb!