Postgresql – Make custom aggregate function easier to use (accept more input types without creating variants)

Recently I wrote a custom aggregate function in postgres that would return a specific column for the row that matches the max/min aggregate using a different column.

While the code in itself works great it is somewhat bothersome to create custom data type for every possible input combination that I might need.

Here is the code I use

CREATE TYPE agg_tuple_text AS
(
    exists boolean,
    value numeric,
    text text
);

--------------------------------------------------------------------------------

CREATE FUNCTION valued_min(old_tuple agg_tuple_text, new_value numeric, new_text text)
    RETURNS agg_tuple_text
    LANGUAGE plpgsql
AS $$
    BEGIN
        IF (old_tuple).exists = false THEN
            RETURN (true, new_value, new_text);
        ELSIF (old_tuple).value > new_value THEN
            RETURN (true, new_value, new_text);
        ELSE
            RETURN old_tuple;
        END IF;
    END;
$$;
--------------------------------------------------------------------------------

CREATE FUNCTION unpack_agg_tuple_text(value agg_tuple_text)
    RETURNS text
    LANGUAGE plpgsql
AS $$
BEGIN
    IF (value).exists = false THEN
        RETURN NULL;
    ELSE
        RETURN (value).text;
    END IF;
END
$$;

--------------------------------------------------------------------------------

CREATE AGGREGATE valued_min(numeric, text)
(
        INITCOND = '(false, 0, null)',
        STYPE = agg_tuple_text,
        SFUNC = valued_min,
        FINALFUNC = unpack_agg_tuple_text
);

--------------------------------------------------------------------------------

-- Example
SELECT min(value) as min_value, valued_min(value, name) as min_name, max..., avg... FROM kv;
-- Output:
-- min_value | min_name           | ...
-- ----------+--------------------+----
--     11.11 | this is the lowest | ...

EDIT: My goal is drawing a min/max/avg chart for a TSDB and displaying the name of the min and max entries each.

Is there a way to achieve this without creating all of these for every possible combination? (Maybe some kind of generic parameter that are present in Java or alike)

Value column types
- Various Date/Time types
- Numeric types
- Maybe text
- (any comparable type)
data column types
- anytype

It would be sufficient if I only could use it for the data value since it isn't used in any calculation inside that code. Unfortunately the anyelement type isn't allowed in custom data types.

I already considered using the json type as input, but that feels somewhat wrong, because it looses the type information (especially for date/time types).

I use Postgres 10 without extensions, but if this is possible using postgres 1x or using a special extension I'm willing to try.

I also considered joining the values, but then I get isues with performance and potential duplicates/rows that have the same value.

Best Answer

Before going into details - are you sure you are not re-inventing a wheel? This might burn down to the very popular topic of greatest-n-per-group.

Your query:

SELECT min(value) as value, valued_min(value, name) as name FROM kv;

Can be rewritten with stock Postgres as:

SELECT value, name
FROM   kv
ORDER  BY value
LIMIT  1;

Which also can use a simple btree index on (value) or (value, name) for an index or index-only scan - much faster.

I am pretty sure any other example can be solved with built-in functionality as well. To get one row per group, your query would be:

SELECT grp_col, min(value) AS value, valued_min(value, name) AS name
FROM   kv
GROUP  BY grp_col;

Replace with:

SELECT DISTINCT ON (grp_col)
       grp_col, value, name
FROM   kv
ORDER  BY grp_col, value;

Again, faster. And much more versatile. Detailed explanation:

Select first row in each GROUP BY group?

Best Answer

Related Solutions

Postgresql – changing float->numeric casts from assignment to implicit, dangerous

PostgreSQL stored procedure which can return one of two custom rowtypes

Related Question