Postgresql – ny way in Postgres to parameterise a procedure for sort order column, when the columns are of different types

datatypesfunctionsorder-byparameterpostgresql

I am trying to write a function which does (some complex stuff) and returns the results in different orders based on a parameter.

A simplified version would look something like this:

CREATE OR REPLACE FUNCTION test(order_column text)
    RETURNS TABLE(thing1 bigint,thing2 text, thing3 timestamp without time zone)
    LANGUAGE 'plpgsql'

AS $BODY$
BEGIN
RETURN QUERY
    SELECT thing1, thing2::text, thing3 FROM some_table
    ORDER BY 
        CASE WHEN order_column='id' THEN thing1
        ELSE thing3
        END
    DESC;
    
END;
$BODY$;

Unfortunately, thing1 is a bigint and thing3 is a timestamp, and when I try to run the function I get an error saying bigint and timestamp types can't be matched, which I interpret as saying that the types returned from the case need to be the same (or at least compatible). I can't cast them both to text, because the range of values don't sort correctly then.

I've tried returning the column numbers instead of the column names – this at least executes, but it ignores the column order (in the function or just executing as a simple statement). For example,

SELECT * FROM some_table ORDER BY 1;

works correctly but

SELECT * FROM some_table ORDER BY CASE WHEN TRUE THEN 1 ELSE 2 END;

does not order by column 1

My work-around would be to do

if column_order='first' then
  (masses of complex stuff)
  SELECT ... ORDER BY thing1
else
  (masses of complex stuff, duplicated)
  SELECT ... ORDER BY thing3
end if;

but that's horrible, and I'm really hoping there's some other way around this, and that I'm currently missing something.

Is there any way to do what I'm trying to do?

Best Answer

Be careful with conditional ordering, it can create bad query plans sometimes forcing table scanning. If the filtering and joining clauses, or just the size of the actual data, mean that you have a small number of rows to sort at the end then this is not an issue and something like this will work:

ORDER BY CASE WHEN ordering_column = 'id'        THEN id        ELSE NULL END
       , CASE WHEN ordering_column = 'timestamp' THEN timestamp ELSE NULL END

In fact it will work anyway, it might just be inefficient for a large amount of data.

For larger outputs your workaround may be more efficient as it may be able to make better use of indexes for the sorting. Another alternative is to have two procedures, one for each sort, and either call each as needed or have your main procedure call the others depending on the sort order it is passed in the parameter. Depending on how postgres handles cached query plans for procedures this may^[†] avoid issues of a cached plan for one case being used for another where it is vastly less efficient.

^{[†] I'm no expert at all on pg's internals, but single "kitchen sink" procedures and queries with conditional sorts etc. can be a performance killer in SQL Server for this sort of reason.}

Awooga! Awooga! Key exposure risk, extreme admin caution required!

BTW, please think carefully about whether PgCrypto is really the right choice. Keys in your queries can be revealed in pg_stat_activity and the system logs via log_statement or via crypto statements that fail with an error. IMO it's frequently better to do crypto in the application.

Witness this session, with client_min_messages enabled so you can see what'd appear in the logs:

regress# SET client_min_messages = 'DEBUG'; SET log_statement = 'all'; 
regress=# select decrypt(pw, 'key', 'aes') from demo;
LOG:  statement: select decrypt(pw, 'key', 'aes') from demo;
LOG:  duration: 0.710 ms
  decrypt   
------------
 \x64617461
(1 row)

Whoops, key possibly exposed in the logs if log_min_messages is low enough. It's now on the server's storage, along with the encrypted data. Fail. Same issue without log_statement if an error occurs to cause the statement to get logged, or possibly if auto_explain is enabled.

Exposure via pg_stat_activity is also possible.. Open two sessions, and:

S1: BEGIN;
S1: LOCK TABLE demo;
S2: select decrypt(pw, 'key', 'aes') from demo;
S1: select * from pg_stat_activity where current_query ILIKE '%decrypt%' AND procpid <> pg_backend_pid();

Whoops! There goes the key again. It can be reproduced without the LOCK TABLE by an unprivileged attacker, it's just harder to time it right. The attack via pg_stat_activity can be avoided by revoking access to pg_stat_activity from public, but it just goes to show that it might not be best to send your key to the DB unless you know your app is the only thing ever accessing it. Even then, I don't like to.

If it's passwords, should you store them at all?

Furthermore, if you're storing passwords, don't two-way encrypt them; if at all possible salt passwords then hash them and store the result. You usually don't need to be able to recover the password cleartext, only confirm that the stored hash matches the password the user sends you to log in when it's hashed with the same salt.

If it's auth, let someone else do it for you

Even better, don't store the password at all, authenticate against LDAP, SASL, Active Directory, an OAuth or OpenID provider, or some other external system that's already designed and working.

Resources

and lots more.

Postgresql – Detailed optimal column order with large text and bytea

Does that mean that columns should be ordered from most space occupation to least?

No, not necessarily. You can play "column tetris" to minimize padding and thereby save some space. The rule of thumb I gave and you quoted is one simple strategy for basic types that require alignment.

As I mentioned in the quoted answer, you can test the actual storage size (excluding item identifier) with pg_column_size() on the whole row.

text and related varchar and char types do not require padding, so there is nothing to gain. The same is true for your bytea columns.

Concerning storage size for:

bytea columns that always have constant 16-byte, 32-byte, or 64-byte lengths

The manual page on bytea tells us :

Storage Size
1 or 4 bytes plus the actual binary string

That means, the actual space required for a bytea column of 16-byte, 32-byte, or 64-byte length is 17 or 20 byte, 33 or 36 byte etc. respectively.

As demonstrated in this SQL Fiddle, a bytea variable always has an overhead of 4 bytes. When stored in a column, however, it starts out with just 1 byte of overhead and switches to 4 bytes for values of 127 bytes length or more.
24 bytes of overhead are added for the row type.
Another 4 bytes are needed for the item identifier per tuple in the data page. Details in this related answer:

Configuring PostgreSQL for read performance

As for alignment requirements of bytea, per documentation:

Values with single-byte headers aren't aligned on any particular boundary, either.

I would suggest you read that whole chapter - probably a couple of times, it's a tough read.

Best Answer

Related Solutions

Postgresql – How to use aes-encryption in PostgreSQL

Awooga! Awooga! Key exposure risk, extreme admin caution required!

If it's passwords, should you store them at all?

If it's auth, let someone else do it for you

Resources

Postgresql – Detailed optimal column order with large text and bytea

Related Question