Using Partial Index with WHERE NOT NULL in JDBC Prepared Statements

indexpostgresqlprepared-statement

Suppose we create a partial index in Postgres that avoids useless NULL data:

 CREATE INDEX my_ix ON my (col1) WHERE col1 IS NOT NULL;

Should I write the SELECT query as:

SELECT * FROM my WHERE col1 = 'abc';

or as:

SELECT * FROM my WHERE col1 = 'abc' AND col1 IS NOT NULL;

In case of JDBC connection libraries it may look like this:

SELECT * FROM my WHERE col1 = ?;

and I am not sure if Postgres can infer index applicability without:

 AND col1 IS NOT NULL

Also it is interesting if Postgres can infer transitivity, for index:

 CREATE INDEX my_ix ON my (col1) WHERE col1 > 0;

I expect I don't need to write:

SELECT * FROM my WHERE col1 > ? AND col1 > 0;

if parameters are 10, 23, etc (> 0).

Best Answer

You can easily try it out with the following script:

CREATE TABLE my (col1) AS
   SELECT * FROM generate_series(1, 100000);

ANALYZE my;

CREATE INDEX my_null ON my (col1) WHERE col1 IS NOT NULL;

/* use a prepared statement similar to JDBC */
PREPARE stmt1(integer) AS SELECT * FROM my WHERE col1 = $1;

/* the first five executions use custom plans */
EXPLAIN (COSTS off) EXECUTE stmt1(1);
EXPLAIN (COSTS off) EXECUTE stmt1(2);
EXPLAIN (COSTS off) EXECUTE stmt1(NULL);
EXPLAIN (COSTS off) EXECUTE stmt1(4);
EXPLAIN (COSTS off) EXECUTE stmt1(5);
/* from the sixth execution on, a generic plan is used */
EXPLAIN (COSTS off) EXECUTE stmt1(6);
EXPLAIN (COSTS off) EXECUTE stmt1(NULL);

DROP INDEX my_null;

CREATE INDEX my_pos ON my (col1) WHERE col1 > 0;

/* use a prepared statement similar to JDBC */
PREPARE stmt2(integer) AS SELECT * FROM my WHERE col1 > $1;

/* the first five executions use custom plans */
EXPLAIN (COSTS off) EXECUTE stmt2(99997);
EXPLAIN (COSTS off) EXECUTE stmt2(99997);
EXPLAIN (COSTS off) EXECUTE stmt2(1);
EXPLAIN (COSTS off) EXECUTE stmt2(NULL);
EXPLAIN (COSTS off) EXECUTE stmt2(99998);
/* PostgreSQL continues to use a custom plan! */
EXPLAIN (COSTS off) EXECUTE stmt2(0);
EXPLAIN (COSTS off) EXECUTE stmt2(NULL);
EXPLAIN (COSTS off) EXECUTE stmt2(99997);

DROP TABLE my;
DEALLOCATE ALL;

Here is an SQL fiddle that tuns the code.

Explanations:

With a prepared statement, PostgreSQL can cache query plans for the database session. Using such a generic plan that stays the same regardless of the parameter values has the advantage that plan time is saved.

But PostgreSQL doesn't always use a generic plan, because sometimes generating a custom plan that respects the parameter value can be better.

To decide what to do, PostgreSQL uses the following heuristic:

  • The first five executions will always use a custom plan.

  • If the estimated cost of the custom plans is not cheaper than the estimated cost of the generic plan, PostgreSQL will use the generic plan from the sixth execution on.

From PostgreSQL v12 on, you can configure the behavior with the plan_cache_mode configuration parameter.

Why does the first statement use a generic plan?

For the first statement, an index scan can always be used, even if the argument is NULL (because nothing has to be done in this case) . So PostgreSQL will use the generic plan after the fifth execution.

You can recognize the generic plan from the $1 in the EXPLAIN output.

Why does the second statement continue to use a custom plan?

With a generic plan, PostgreSQL can only use a sequential scan, because the index can only be used for some parameter values. For some of the first five executions the custom plan is significantly cheaper than the generic plan, so PostgreSQL continues using custom plans.

To answer your question:

If it is necessary to add an extra WHERE condition that makes sure that PostgreSQL knows it can use an index depends on the individual case.

You can add the extra condition without a disadvantage, it might help the optimizer and cannot do harm.

But even with the extra condition you cannot be certain that the index will be used.