Postgresql – Why is this query with WHERE, ORDER BY and LIMIT so slow

database-designindex-tuningorder-byperformancepostgresqlpostgresql-performance

Given this table posts_lists:

                      Table "public.posts_lists"
   Column   |          Type          | Collation | Nullable | Default
------------+------------------------+-----------+----------+---------
 id         | character varying(20)  |           | not null |
 user_id    | character varying(20)  |           |          |
 tags       | jsonb                  |           |          |
 score      | integer                |           |          |
 created_at | integer                |           |          |
Indexes:
    "tmp_posts_lists_pkey1" PRIMARY KEY, btree (id)
    "tmp_posts_lists_idx_create_at1532588309" btree (created_at)
    "tmp_posts_lists_idx_score_desc1532588309" btree (score_rank(score, id::text) DESC)
    "tmp_posts_lists_idx_tags1532588309" gin (jsonb_array_lower(tags))
    "tmp_posts_lists_idx_user_id1532588309" btree (user_id)

Getting a list by tag is fast:

EXPLAIN ANALYSE
SELECT * FROM posts_lists
WHERE jsonb_array_lower(tags) ? lower('Qui');
Bitmap Heap Scan on posts_lists  (cost=1397.50..33991.24 rows=10000 width=56) (actual time=0.110..0.132 rows=2 loops=1)
  Recheck Cond: (jsonb_array_lower(tags) ? 'qui'::text)
  Heap Blocks: exact=2
  ->  Bitmap Index Scan on tmp_posts_lists_idx_tags1532588309  (cost=0.00..1395.00 rows=10000 width=0) (actual time=0.010..0.010 rows=2 loops=1)
        Index Cond: (jsonb_array_lower(tags) ? 'qui'::text)
Planning time: 0.297 ms
Execution time: 0.157 ms

Getting a list ordered by score, limit 100 – also fast:

EXPLAIN ANALYSE
SELECT *
FROM posts_lists
ORDER BY score_rank(score, id) DESC
LIMIT 100;
Limit  (cost=0.56..12.03 rows=100 width=88) (actual time=0.074..0.559 rows=100 loops=1)
  ->  Index Scan using tmp_posts_lists_idx_score_desc1532588309 on posts_lists  (cost=0.56..1146999.15 rows=10000473 width=88) (actual time=0.072..0.535 rows=100 loops=1)
Planning time: 0.586 ms
Execution time: 0.714 ms

But combining the above two queries is very slow:

EXPLAIN ANALYSE
SELECT * FROM posts_lists
WHERE jsonb_array_lower(tags) ? lower('Qui')
ORDER BY score_rank(score, id) DESC
LIMIT 100;
Limit  (cost=0.56..33724.60 rows=100 width=88) (actual time=2696.965..493476.142 rows=2 loops=1)
  ->  Index Scan using tmp_posts_lists_idx_score_desc1532588309 on posts_lists  (cost=0.56..3372404.39 rows=10000 width=88) (actual time=2696.964..493476.139 rows=2 loops=1)
        Filter: (jsonb_array_lower(tags) ? 'qui'::text)
        Rows Removed by Filter: 9999998
Planning time: 0.426 ms
Execution time: 493476.190 ms

Why? How to improve the efficiency of the query?

Definition of the two functions used above:

create or replace function score_rank(score integer, id text)
  returns text as $$
select case when score < 0
  then '0' || lpad((100000000 + score) :: text, 8, '0') || id
       else '1' || lpad(score :: text, 8, '0') || id
       end
$$
language sql immutable;


create or replace function jsonb_array_lower(arr jsonb)
  returns jsonb as $$
SELECT jsonb_agg(lower(elem))
FROM jsonb_array_elements_text(arr) elem
$$
language sql immutable;

Best Answer

Sorting and paging

Your function score_rank() produces a text from an integer score and the appended PK id. That's not helpful for sorting. Replace it completely, I suspect you do not need it at all. Instead use the two columns score and id directly for sorting:

SELECT *
FROM   posts_lists
ORDER  BY score DESC, id DESC
LIMIT  100;

Replace your index tmp_posts_lists_idx_score_desc1532588309 with a smaller, faster, cheaper to maintain, more versatile index on (score DESC, id DESC).

You can also base pagination on this multicolumn index efficiently with row value comparison. See:

You later mentioned a new function concatenating strings with base256 etc. All that smart trickery is not going to increase performance. Sorting on an integer is faster than sorting on strings in Postgres. Using integer (or bigint) instead of varchar(20) would actually help in multiple ways.

Statistics and query plan (a.k.a: Why?)

The main issue is the lack of statistics for values nested in the jsonb column. Postgres consequently sometimes misjudges the selectivity of the predicate jsonb_array_lower(tags) ? lower('Qui') and chooses a bad query plan. In your example with LIMIT 2 the logic of the query planner can be illustrated like this - let's call this "Plan 1":

Only two rows with the highest scores? Let's scan the index posts_lists_idx_score_desc starting with the highest scores. With any luck we'll have the result in no time!

It's a reasonable plan for most cases with at least moderately common tags. But the tag 'qui' turns out to be very rare, and with low scores, too. The worst case. Postgres ends up scanning close to 4 million rows, just to keep 2. A colossal waste of time:

Rows Removed by Filter: 3847383

If the query planner had any idea how rare that tag actually is, it would start with the other index posts_lists_idx_tags like we see in your second example with LIMIT 100 - let's call this "Plan 2":

Find matching rows, then sort by score and take the top N.

Plan 1 is more favorable the smaller the LIMIT and the more frequent the tag. (And if qualifying rows happen to sort on top.)
Plan 2 is more favorable the bigger the LIMIT and the less frequent the tag.

Postgres has currently no statistics about nested values in document types like jsonb. And no combined frequencies at all. See:

Update: "Combined statistics" became possible with CREATE STATISTICS in Postgres 10.

What ever else you do, be sure to run the latest version of Postgres. The planner gets smarter with every release.

Alternatives

1. One idea might be to use a Postgres array (text[]) and array operators instead of the jsonb column to have some statistics for most common elements. Columns most_common_elems, most_common_elem_freqs, and elem_count_histogram in the system view pg_stats.

Helps Postgres to generate better query plans for some constellations, but it's no silver bullet. For starters, only the most common elements are stored. Postgres still doesn't know about the rarest elements.

2. Normalize your db design and move tags to a separate 1:n table with a single tag per row. That increases the disk footprint because of the added row overhead per tag. (But changing tags becomes much cheaper with less table bloat.) If your tags are stable, consider a full n:m relationship between posts_lists and a new table tags. That's a bit smaller for lots of common tags, too. And it's the "clean" way. You have more detailed statistics and should see fewer bad query plans.

3. Since Postgres 10 there is a variant of to_tsvector() that processes json(b) values. So it's simple to create a text search index and work with text search operators now.

Index:

CREATE INDEX posts_lists_idx_tags_fts ON posts_lists USING gin (to_tsvector('simple', tags));

Query:

SELECT * FROM posts_lists
WHERE  to_tsvector('simple', tags) @@ to_tsquery('simple', 'qui') -- text search is case insensitive
ORDER  BY score DESC, id DESC
LIMIT  2;

Be sure to use the simple dictionary. You don't want stemming which is built into most other dictionaries.

Text search functions produce lower case output, it all works case insensitive by design. No need for processing like in your original function jsonb_array_lower().

4. While sticking with a jsonb index, try the more specialized jsonb_path_ops operator class:

CREATE INDEX ON posts_lists USING gin (jsonb_array_lower(tags) jsonb_path_ops);

Query with:

WHERE  jsonb_array_lower(tags) @> '["qui"]'

The manual:

Although the jsonb_path_ops operator class supports only queries with the @> operator, it has notable performance advantages over the default operator class jsonb_ops. A jsonb_path_ops index is usually much smaller than a jsonb_ops index over the same data, and the specificity of searches is better, particularly when queries contain keys that appear frequently in the data. Therefore search operations typically perform better than with the default operator class.

But I do not expect much for your particular case.

5. Use a regime of "granulated" indexes, combined with a procedural solution. See:

db<>fiddle here - with a number of tests ...