I've got a simple but slow query:
SELECT DISTINCT title
FROM ja_jobs
WHERE title ILIKE '%RYAN WER%'
AND clientid = 31239
AND time_job > 1457826264
ORDER BY title
LIMIT 10;
Explain analyze:
Limit (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
-> Unique (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
-> Sort (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
Rows Removed by Filter: 791
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
Total runtime: 2746.879 ms
Then, I created a trigram index:
CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);
Explain analyze after adding the index: (Yes, I analyze
)
Limit (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
-> Unique (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
-> Sort (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "ja_jobs" (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
Recheck Cond: (("clientid" = 31239) AND ("time_job" > 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
Rows Removed by Index Recheck: 4
-> BitmapAnd (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)
Index Cond: (("clientid" = 31239) AND ("time_job" > 1457826264))
-> Bitmap Index Scan on "ix_ja_jobs_trgm_gin" (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)
Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text")
Total runtime: 3720.653 ms
As you can see, the index did not work.
Table public.ja_jobs
:
CREATE TABLE public.ja_jobs (
id bigint NOT NULL DEFAULT "nextval"('"ja_jobs_id_seq"'::"regclass"),
refnum character varying(100) NOT NULL DEFAULT ''::character varying,
clientid bigint NOT NULL DEFAULT 0,
customerid bigint,
time_job bigint,
priority smallint NOT NULL DEFAULT 0,
status character varying(255) NOT NULL DEFAULT 'active'::"bpchar",
title character varying(100) NOT NULL DEFAULT ''::character varying,
-- some other irrelevant columns
)
Indexes on public.ja_jobs
:
Indexes:
"ja_jobs_pkey" PRIMARY KEY, "btree" ("id")
"ix_bill_customer_jobs" "btree" ("customerid", "bill_customer")
"ix_clientid_jobs" "btree" ("clientid")
"ix_customerid_job" "btree" ("customerid")
"ix_ja_jobs_clientid_modified_date_created_date" "btree" ("clientid", "modified_date", "created_date")
"ix_ja_jobs_gsdi_pk" "btree" (("id"::"text"))
"ix_ja_jobs_trgm_gin" "gin" ("title" "gin_trgm_ops")
"ix_job_customer_recent_jobs_lookaside_bill_customer" "btree" ("bill_customer", "modified_date")
"ix_job_customer_recent_jobs_lookaside_clientid" "btree" ("clientid", "modified_date")
"ix_job_customer_recent_jobs_lookaside_customer" "btree" ("customerid", "modified_date")
"ix_jobs_charges_and_parts_sort" "btree" (("charges_count" + "parts_count"))
"ix_jobs_client_times" "btree" ("clientid", "time_job", "time_arrival")
"ix_jobs_fts_description_en" "gin" ("full_text_universal_cast"("description"))
"ix_jobs_fts_full_address_en" "gin" ((((("full_text_universal_cast"("address"::"text") || "full_text_universal_cast"("suburb"::"text")) || "full_text_universal_cast"("city"::"text")) || "full_text_universal_cast"("stpr"::"text")) || "full_text_universal_cast"("postc
ode"::"text")))
"ix_jobs_fts_job_number_en" "gin" ("full_text_universal_cast"("job_number"::"text"))
"ix_jobs_fts_refnum_en" "gin" ("full_text_universal_cast"("refnum"::"text"))
"ix_jobs_fts_title_en" "gin" ("full_text_universal_cast"("title"::"text"))
"ix_jobs_full_address_street_first" "btree" (((((COALESCE("address"::character varying, ''::character varying)::"text" || COALESCE(' '::"text" || "suburb"::"text", ''::"text")) || COALESCE(' '::"text" || "city"::"text", ''::"text")) || COALESCE(' '::"text" || "postc
ode"::"text", ''::"text")) || COALESCE(' '::"text" || "stpr"::"text", ''::"text")))
"ix_jobs_paying_customers" "btree" ((COALESCE("bill_customer", "customerid")))
"ix_jobs_status_label_ids" "btree" ("status_label_id")
"ix_jobs_top_by_client" "btree" ("id", "clientid")
"ix_mobiuser_jobs" "btree" ("accepted_mobile_user")
"ix_recurrenceid_jobs" "btree" ("recurrenceid")
"ix_timejob_jobs" "btree" ("time_job")
"ja_jobs_client_type" "btree" ("clientid", "jobtype")
"ja_jobs_the_geom_idx" "gist" ("the_geom")
Question:
What can I do to improve the query? Why is the trigram index not working as expected?
UPDATE: Re-ran the explain analyze buffer
Limit (cost=199669.37..199669.39 rows=10 width=20) (actual time=31523.690..31523.691 rows=1 loops=1)
Buffers: shared hit=26947 read=101574 dirtied=438
-> Sort (cost=199669.37..199669.40 rows=11 width=20) (actual time=31523.686..31523.686 rows=1 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=26947 read=101574 dirtied=438
-> Bitmap Heap Scan on "ja_jobs" (cost=4850.60..199669.18 rows=11 width=20) (actual time=11714.504..31523.640 rows=1 loops=1)
Recheck Cond: (("clientid" = 2565) AND ("time_job" > 1382496599))
Filter: (("title")::"text" ~~* '%Hislop%'::"text")
Rows Removed by Filter: 207654
Buffers: shared hit=26942 read=101574 dirtied=438
-> Bitmap Index Scan on "ix_jobs_client_times" (cost=0.00..4850.60 rows=155054 width=0) (actual time=11670.956..11670.956 rows=215142 loops=1)
Index Cond: (("clientid" = 2565) AND ("time_job" > 1382496599))
Buffers: shared hit=121 read=5772
Total runtime: 31524.874 ms
After removing DISTINCT
and the left %
:
explain (analyze, buffers)
SELECT title
FROM ja_jobs
WHERE title ILIKE 'Hislop 13035%'
AND clientid = 2565
AND time_job > 1382496599
ORDER BY title
LIMIT 10;
Limit (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.479..3492.483 rows=1 loops=1)
Buffers: shared hit=4940 read=448
I/O Timings: read=83.285
-> Sort (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.475..3492.477 rows=1 loops=1)
Sort Key: "title"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=4940 read=448
I/O Timings: read=83.285
-> Bitmap Heap Scan on "ja_jobs" (cost=391.62..2275.38 rows=9 width=20) (actual time=3492.460..3492.462 rows=1 loops=1)
Recheck Cond: (("title")::"text" ~~* 'Hislop Street Clinic 2513035%'::"text")
Filter: (("time_job" > 1382496599) AND ("clientid" = 2565))
Buffers: shared hit=4940 read=448
I/O Timings: read=83.285
-> Bitmap Index Scan on "ix_jobs_trgm_gin" (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)
Index Cond: (("title")::"text" ~~* 'Hislop 13035%'::"text")
Buffers: shared hit=4939 read=448
I/O Timings: read=83.285
Total runtime: 3492.531 ms
- As you can see, the query is hitting the new index but it's slower.
Then I removed ORDER BY
but the query is still slow.
Also, I tried to use LIKE
(with is much faster), but LIKE
is case-sensitive, so for that reason I got no rows back. Can't use it.
Best Answer
You have a lot of indexes. I doubt you need all of them. Check whether all of them are in use. Instructions in the manual, chapter Examining Index Usage.
If your system is configured to gather statistics, it will be particularly revealing to study:
These statistics are also displayed in pgAdmin.
Some indexes are particularly odd, like:
"ix_ja_jobs_gsdi_pk" "btree" (("id"::"text"))
- why would anyone cast abigint
id totext
for indexing?Useless indexes are not going to impair read performance (much), but they are a burden on write performance and general maintenance.
The major difficulty for your query is to estimate the selectivity of your various predicates. That's comparatively simple for the
bigint
columnsclientid
andtime_job
, but hard for pattern matching (title ILIKE 'Hislop 13035%'
).In your case (UPDATE 3), Postgres estimates to find 482 rows matching the pattern, but it turns out to be just a single row:
Tuning the query depends on the complete picture: Cardinalities, data distribution, hardware, load, concurrency, query frequencies, priorities ...
It may help to increase the statistics target for involved columns:
Then:
ANALYZE ja_jobs;
But don't expect much. Details:Estimating selectivity of free-floating
LIKE
patterns is hard. Left-anchored is easier - you are mixing these two rather different cases freely:ILIKE '%RYAN WER%'
(with leading wildcard) is much more complex thanILIKE 'Hislop 13035%'
. Overview:There have been improvements for the latest version, but the biggest improvements come with Postgres 9.6 (currently beta) and its new version of the pg_trgm module. Consider the release notes here. Related:
There are various other ways to improve performance, depending on all the information that's not in your question. Maybe a multicolumn index, or a partial index or a CTE in your query.
All the general advice for performance tuning applies as well:
Configuring PostgreSQL for read performance
http://wiki.postgresql.org/wiki/Performance_Optimization
You already removed the expensive
DISTINCT
. If you don't needORDER BY title
, it may change the query plan completely to drop that, too: With both of these removed, Postgres is free to pick the first 10 matches and ignore the rest. Else, all matches have to be found and considered. can be much more expensive. Try:If you actually only deal with left-anchored
LIKE
pattern (trailing wildcards like'Hislop 13035%'
, but not:'%RYAN WER%'
), then you can use a very fastvarchar_pattern_ops
index. Detailed explanation:So:
Index columns in this order. Equality first, range later. Explanation:
You can extend this solution to cover
ILIKE
with a functional elementAnd adapt your query: