PostgreSQL ORDER BY Error – Function array_position(text[], character varying) Does Not Exist

arrayenumorder-bypostgresql

I have a pretty basic categorical column in a Postgres database that is currently stored as VARCHAR. I can select a count of each with:

I though adding an ORDER BY array_position() would do it:

SELECT color, count(*)
FROM research
GROUP BY color 
ORDER BY array_position(ARRAY['Red','Orange','Yellow','Green','Blue'], color);

But I'm seeing a type error:

ERROR: function array_position(text[], character varying) does not exist
SQL state: 42883
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Character: 98

What do I need to cast color as so I can use the array_position function to order it?

Best Answer

Sample data

You have a table like,

CREATE TABLE research(colors)
  AS VALUES ('Blue'), ('Orange'), ('Yellow');

`ENUM` type

You have an enumerated list of colors. So the easy thing here would be to use an ENUM type

CREATE TYPE colors AS ENUM ('Red','Orange','Yellow','Green','Blue');

Then

ALTER TABLE research
  ALTER COLUMN colors      -- myColorsColumn
  SET DATA TYPE colors
  USING (colors::colors);  -- myColorsColumn::NewType

Now it's faster, more efficient, and cleaner. MORE WIN. MORE JOY. Etc. ENUM are stored as 4-byte internally.

ORDER BY colors

Is all you'll need.

Array Sort

But, you've got a God-given right to treat this like any other database that doesn't support ENUM types,

ORDER BY array_position(ARRAY['Red','Orange','Yellow','Green','Blue'], color);

I think this is perfectly fine, but I believe there is a limitation here in the implementation,

ARRAY['foo', 'bar', 'baz']

Is essentially

ARRAY['foo', 'bar', 'baz']::text

This means you need to either (speed doesn't matter as both options perform the same)

Make the color column of the native text. This can be done in the call, or you can actually modify the table and should. In PostgreSQL, nothing should be varchar without a limit (since that's just a parallel type for text), and very few things should be varchar with a limit (since there is no advantage)
```
color::text            -- PostgreSQL sexy sexy cast
CAST(color AS TEXT))   -- ANSI SQL standardized vanilla and boring
```

Or, you can construct the array itself as type varchar[]

array_position(ARRAY['Red','Orange','Yellow','Green','Blue']::varchar[], color);

Further notes

In PostgreSQL, we wouldn't use varchar for colors. Even if you insist on not using an ENUM here (though I would), that should be text

The array_position is a shorthand, I expect it to be substantially slower than a similar operation though

CASE
  WHEN color='Red' THEN 1::smallint
  WHEN color='Orange' THEN 2::smallint
  WHEN color='Yellow' THEN 3::smallint
  ... etc
END;

Awooga! Awooga! Key exposure risk, extreme admin caution required!

BTW, please think carefully about whether PgCrypto is really the right choice. Keys in your queries can be revealed in pg_stat_activity and the system logs via log_statement or via crypto statements that fail with an error. IMO it's frequently better to do crypto in the application.

Witness this session, with client_min_messages enabled so you can see what'd appear in the logs:

regress# SET client_min_messages = 'DEBUG'; SET log_statement = 'all'; 
regress=# select decrypt(pw, 'key', 'aes') from demo;
LOG:  statement: select decrypt(pw, 'key', 'aes') from demo;
LOG:  duration: 0.710 ms
  decrypt   
------------
 \x64617461
(1 row)

Whoops, key possibly exposed in the logs if log_min_messages is low enough. It's now on the server's storage, along with the encrypted data. Fail. Same issue without log_statement if an error occurs to cause the statement to get logged, or possibly if auto_explain is enabled.

Exposure via pg_stat_activity is also possible.. Open two sessions, and:

S1: BEGIN;
S1: LOCK TABLE demo;
S2: select decrypt(pw, 'key', 'aes') from demo;
S1: select * from pg_stat_activity where current_query ILIKE '%decrypt%' AND procpid <> pg_backend_pid();

Whoops! There goes the key again. It can be reproduced without the LOCK TABLE by an unprivileged attacker, it's just harder to time it right. The attack via pg_stat_activity can be avoided by revoking access to pg_stat_activity from public, but it just goes to show that it might not be best to send your key to the DB unless you know your app is the only thing ever accessing it. Even then, I don't like to.

If it's passwords, should you store them at all?

Furthermore, if you're storing passwords, don't two-way encrypt them; if at all possible salt passwords then hash them and store the result. You usually don't need to be able to recover the password cleartext, only confirm that the stored hash matches the password the user sends you to log in when it's hashed with the same salt.

If it's auth, let someone else do it for you

Even better, don't store the password at all, authenticate against LDAP, SASL, Active Directory, an OAuth or OpenID provider, or some other external system that's already designed and working.

Resources

and lots more.

PostgreSQL function param json_populate_recordset ARRAY problem

Confusion?

From the docs json_populate_recordset(base anyelement, from_json json) does not return an ARRAY, it returns a setof anyelement. That's fundamentally different from an array.

SELECT * FROM (SELECT ARRAY[1,2]) AS gs(x);   -- array (int[])
SELECT * FROM (VALUES (1), (2)  ) AS gs(x);   -- setof anyelement (setof int)

I am of the opinion that you do not need an array, nor a setof anyelement.

What you need to do things your way

For comparison this will turn json into foo[],

CREATE TYPE foo AS ( id int );
SELECT pg_typeof(
  ARRAY(
    SELECT *
    FROM json_populate_recordset( NULL::foo, '[{"id":1},{"id":2}]' ) )
  )
);

Which you can call like this

SELECT test_gps(
  ARRAY(
    SELECT *
    FROM json_populate_recordset( NULL::foo, '[{"id":1},{"id":2}]' ) )
  )
);

A better way!

But the better way is to simply not process the whole result set but only a single json value.

CREATE OR REPLACE FUNCTION test_gps
  (
    gps_points API_GPS_POINT
  )
  RETURNS BOOLEAN
LANGUAGE plpgsql
AS $$
DECLARE
  api_gps_points API_GPS_POINT;
BEGIN

    RAISE NOTICE 'API_GPS_POINT : %', API_GPS_POINT;

END;
$$;

And then call it like this..

SELECT test_gps(myjson)
FROM json_populate_recordset( null::foo, '[{"id":1},{"id":2}]' )
  AS myjson;

Which you can see returns one type of foo.

SELECT pg_typeof(myjson)
FROM json_populate_recordset( null::foo, '[{"id":1},{"id":2}]' )
  AS myjson;