Postgresql – EXECUTE within function not using index

dynamic-sqlindex-tuningperformancepostgresqlpostgresql-9.2postgresql-performance

I have an accounts table with ~200k rows and an index for these columns:

  account_type_id BIGINT
, member_id BIGINT
, external_id VARCHAR(64)

CREATE INDEX account_full_qualifiers_idx
ON normal_object.account (account_type_id, member_id, external_id) TABLESPACE index_tbsp;

And I have a function that is performing some ETL work with the following query:

EXECUTE '
   SELECT * FROM normal_object.account
   WHERE account_type_id = $1
   AND member_id = $2
   AND external_id = $3'
INTO e_row
USING r_row.account_type_id, r_row.member_id, r_row.external_id;

Yet, the EXECUTE command is NOT using the index and I'm not sure why. My only guess is that the data types are not lining up. Yet r_row.account_type_id is a BIGINT, r_row.member_id is a BIGINT, and r_row.external_id is a VARCHAR(64).

Any suggestions as to why it is not using the index?
How can I get it to use the index? (I've already tried setting enable_seqscan off.)

Best Answer

Queries executed with EXECUTE are re-planned with the actual parameter values passed to it every time. Since you are using Postgres 9.2, you may not even need EXECUTE:

PostgreSQL Stored Procedure Performance

Either way, upgrading to Postgres 9.3 (or the upcoming 9.4) might help some more.

So, unless we have a type mismatch (like you already suspected) or a collation mismatch, your index should be used, if the parameter values are selective enough (retrieving less than ~ 5% of the table), which most probably is the case.

You write:

r_row.account_type_id is a BIGINT, r_row.member_id is a BIGINT, and r_row.external_id is a VARCHAR(64).

But how do you know that? If r_row is declared as record, data types of columns are defined in the assignment of an actual row ...
We need to see the complete plpgsql function with header and footer. And the exact table definition.

`COLLATE "C"` for your char-type index

An index on an integer type column is far more efficient than one on varchar(64) in any case. If you have to use the character type but you don't need to sort the column according to your collation setting (as I presume) it would be more efficient to use COLLATE "C" for index and query.

CREATE INDEX account_full_qualifiers_idx
ON normal_object.account (account_type_id, member_id, external_id COLLATE "C")
TABLESPACE index_tbsp;

Your query would then be:

EXECUTE '
   SELECT *
   FROM   normal_object.account
   WHERE  account_type_id = $1
   AND    member_id = $2
   AND    external_id COLLATE "C" = $3
   LIMIT  1'
INTO  e_row
USING r_row.account_type_id, r_row.member_id, r_row.external_id;

LIMIT 1 may or may not be needed. Your index is not UNIQUE, though ...

Or you define the table column with COLLATE "C" to begin with. Index and query will default to the collation of the column.

This may or may not solve the question at hand - it's a good idea for your situation in any case.

Details in this related answer (chapters "Strings and collation" and "Index":

Supported operators and workaround

The default operator class for GIN indexes on json columns jsonb_ops only supports these operators (per documentaion):

Name        Indexed Data Type   Indexable Operators
...
jsonb_ops   jsonb               ? ?& ?| @>

You can achieve this the other way round: Create a simple IMMUTABLE SQL function using the ? operator, which can be inlined and will use the index just like the operator itself:

CREATE OR REPLACE FUNCTION jb_contains(jsonb, text)
  RETURNS bool AS
'SELECT $1 ? $2' LANGUAGE sql IMMUTABLE;

This works, I tested in Postgres 9.4 ...

Misunderstandings

However, you have been asking the wrong question. There are two basic misconceptions in your question.

The jsonb operator ? cannot be used to search for values. Only for keys or array elements. The manual:

Description:
Does the string exist as a top-level key within the JSON value?

You got the wrong operator, the WHERE condition cannot work: ~~WHERE d->'name' ? 'First'~~
The expression index you have doesn't makes sense either way
```
CREATE INDEX idx_jsonthings_name ON jsonthings USING GIN ((d->'name'));
```
- The expression d->'name' returns a jsonb value. You would need d->>'name' to get the value as text.
- But that would still be pointless. Since the value of the name key is a simple string, a GIN index (while possible) makes no sense to begin with.

Solutions

You don't need the operator ? - so no workaround either.
Here are two ways that would actually work:

Plain GIN index on d and use the "contains" operator @>:
```
CREATE INDEX idx_jsonthings_d_gin ON jsonthings USING GIN (d);

SELECT d FROM jsonthings WHERE d @> '{"name":"First"}'
```
You could even use the more specialized operator class jsonb_path_ops. See:
- How to get particular object from jsonb array in PostgreSQL?

B-tree expression index on d->>'email' and test with good old =:

CREATE INDEX idx_jsonthings_d_email ON jsonthings ((d->>'email'));

SELECT d FROM jsonthings WHERE d->>'email' = 'First';

The second index would be considerably smaller and the query faster.

Postgresql – Postgres not using the index even when rows returned is 5% of the table

From your query plans, it looks like you're comparing ints to ints in the first query plan, and int to numeric in the second plan.

Your first compare:

Index Cond: (("timestamp" >= 1431100800) AND ("timestamp" <= 1431108000))

and

timestamp >= 1431100800 and timestamp <= 1431108000

In the second query, it's numeric values:

Filter: ((numvalues[1] IS NOT NULL) AND (("timestamp")::numeric >= 1431100800.00) AND (("timestamp")::numeric <= 1431108000.00))

and

timestamp >= 1431093600.00 and timestamp <= 1431100800.00

Casting to numeric causes the index to be ignored in favor of a sequential scan.

You can see this with a very simple example, set up below:

CREATE TABLE t2 (a int);
CREATE INDEX t2_a_idx ON t2(a);
INSERT INTO t2 (a) SELECT i FROM generate_series(1,1000000) AS i;
VACUUM ANALYZE VERBOSE t2;

My first query plan looks like this:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE a > 750000;

Index Only Scan using t2_a_idx on t2 (cost=0.42..7134.65 rows=250413 width=4) 
(actual time=0.019..29.926 rows=250000 loops=1)
Index Cond: (a > 750000)
Heap Fetches: 0
Planning time: 0.137 ms
Execution time: 39.114 ms
(5 rows)
Time: 39.540 ms

While a second query using numerics looks like this:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE a > 750000.00;

Seq Scan on t2  (cost=0.00..19425.00 rows=333333 width=4) (actual time=122.803..175.326 rows=250000 loops=1)
Filter: ((a)::numeric > 750000.00)
Rows Removed by Filter: 750000
Planning time: 0.058 ms
Execution time: 184.194 ms
(5 rows)
Time: 184.487 ms

In the second instance here, the index is ignored in favor of a sequential scan because of the cast to a numeric value, which looks like exactly what's happening in your two examples.

One last aside, you might be able to speed your query up via a SET query before executing it as well:

SET work_mem = 2GB;

If your server can handle it, because your sorts are spilling to disk, as noted in this line from your query plan:

Sort Method: external merge  Disk: 1387704kB

Hope this helps. =)

Best Answer

COLLATE "C" for your char-type index

Related Solutions

PostgreSQL operator uses index but underlying function does not

Supported operators and workaround

Misunderstandings

Solutions

Postgresql – Postgres not using the index even when rows returned is 5% of the table

Related Question

`COLLATE "C"` for your char-type index