PostgreSQL – How to Create Aggregate Function with Implicit DISTINCT?

aggregatedistinctpostgresql

We have an ERP system which allows aggregates to be used (eg SUM(foo)) but not with DISTINCT (eg SUM(DISTINCT foo).

Is it possible to create an aggregate function (SUM_DISTINCT), that returns the same result as as SUM(DISTINCT foo), so SUM_DISTINCT(foo) = SUM(DISTINCT foo)?

Best Answer

Is it possible to create an aggregate function (SUM_DISTINCT), that returns the same result as as SUM(DISTINCT foo), so SUM_DISTINCT(foo) = SUM(DISTINCT foo)?

Yes, it is possible — you need a User-defined Aggregate, such as this:

create or replace function f_sum_distinct (numeric[], numeric) returns numeric[]
    language sql as $$
select $1||$2;
$$;

create or replace function f_sum_distinct_final (numeric[]) returns numeric 
    language sql as $$
select sum(v) from (select distinct unnest($1) v) z;
$$;

create aggregate sum_distinct(numeric)
( sfunc     = f_sum_distinct
 ,stype     = numeric[]
 ,finalfunc = f_sum_distinct_final
);

with w(v) as (select 2 union all select 2 union all select 3)
select sum(v) "Plain SUM", sum(distinct v) "SUM(DISTINCT)", sum_distinct(v) "SUM_DISTINCT" from w;

/*
|Plain SUM|SUM(DISTINCT)|SUM_DISTINCT|
|--------:|------------:|-----------:|
|        7|            5|           5|
*/

dbfiddle here

Note though (thanks @Erwin), that performance is going to be very substantially worse than the built-in aggregates. If this matters you will have to consider writing the helper functions in C, which is much more of an undertaking.