Postgresql – Use array_position() function to get the most frequent value for column from pg_stats

arraydatatypesperformancepostgresqlpostgresql-performance

I'm trying to execute this simple query in order to check whether some value (1000) belongs to MCV list used by Postgres query optimizer:

SELECT array_position(most_common_vals, 1000) 
FROM pg_stats 
WHERE tablename = 'tenk1' 
AND attname = 'unique1';

But receive the following error message:

ERROR:  function array_position(anyarray, integer) does not exist

How can it be fixed?

array_position() is the standard function as described here and the following statement returns 2 as expected:

SELECT array_position('{1,2,3}', 2);

Best Answer

Solution

Assuming the data type of your column tenk1.unique1 is integer:

SELECT array_position(most_common_vals::text::int[], 1000) 
FROM   pg_stats 
WHERE  tablename = 'tenk1' 
AND    attname = 'unique1';

Use your actual column type and the corresponding array type.
You get the position, or NULL if the value is not in the MCV list.

The solution is short - unlike the ...

Explanation

The function array_position() is defined to take (anyarray, anyelement) (or (anyarray, anyelement, integer) for the second variant).

The column pg_stats.most_common_vals has the polymorphic data type anyarray to be able to hold arrays of any data type - for obvious reasons.

anyarray and anyelement are not allowed as data types for user-created tables. For users, both are polymorphic pseudo-types. (But Postgres can use them in system tables.)

Multiple polymorphic variables in the same function have to resolve to the same (or corresponding) data types. The manual:

Furthermore, if there are positions declared anyarray and others declared anyelement, the actual array type in the anyarray positions must be an array whose elements are the same type appearing in the anyelement positions.

And:

Thus, when more than one argument position is declared with a polymorphic type, the net effect is that only certain combinations of actual argument types are allowed.

Bold emphasis mine.

You have found a corner case where function type resolution fails for the combination of the polymorphic anyarray with an integer - or any non-polymorphic type in second position.

1000 in your expression array_position(most_common_vals, 1000) is a numeric constant resolving to integer. These would fail in similar fashion:

array_position(most_common_vals, '1000')  -- untyped string literal
ERROR:  function array_position(anyarray, unknown) does not exist
array_position(most_common_vals, '1000'::text)
ERROR:  function array_position(anyarray, text) does not exist

Furthermore, there are no casts defined for anyarray, being a pseudo-type in user-land:

SELECT * FROM pg_cast WHERE castsource = 'anyarray'::regtype;  -- nothing found

The workaround is to cast to text as stepping stone, since any type can be cast to text. Then cast to integer[], arriving at the solution above.

In closing, I think this is a shortcoming in function type resolution that could be resolved (easily?). But since the data type anyarray is not supposed to be used like this in user-land to begin with I doubt that any developer will spend time on it ...