PostgreSQL – How to Create Aggregate Function with Implicit DISTINCT?

aggregatedistinctpostgresql

We have an ERP system which allows aggregates to be used (eg SUM(foo)) but not with DISTINCT (eg SUM(DISTINCT foo).

Is it possible to create an aggregate function (SUM_DISTINCT), that returns the same result as as SUM(DISTINCT foo), so SUM_DISTINCT(foo) = SUM(DISTINCT foo)?

Best Answer

Is it possible to create an aggregate function (SUM_DISTINCT), that returns the same result as as SUM(DISTINCT foo), so SUM_DISTINCT(foo) = SUM(DISTINCT foo)?

Yes, it is possible — you need a User-defined Aggregate, such as this:

create or replace function f_sum_distinct (numeric[], numeric) returns numeric[]
    language sql as $$
select $1||$2;
$$;

create or replace function f_sum_distinct_final (numeric[]) returns numeric 
    language sql as $$
select sum(v) from (select distinct unnest($1) v) z;
$$;

create aggregate sum_distinct(numeric)
( sfunc     = f_sum_distinct
 ,stype     = numeric[]
 ,finalfunc = f_sum_distinct_final
);

with w(v) as (select 2 union all select 2 union all select 3)
select sum(v) "Plain SUM", sum(distinct v) "SUM(DISTINCT)", sum_distinct(v) "SUM_DISTINCT" from w;

/*
|Plain SUM|SUM(DISTINCT)|SUM_DISTINCT|
|--------:|------------:|-----------:|
|        7|            5|           5|
*/

dbfiddle here

Note though (thanks @Erwin), that performance is going to be very substantially worse than the built-in aggregates. If this matters you will have to consider writing the helper functions in C, which is much more of an undertaking.

`DISTINCT ON()`

Just as a side note, this is precisely what DISTINCT ON() does (not to be confused with DISTINCT)

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

So if you were to write,

SELECT myFirstAgg(z)
FROM foo
GROUP BY x,y;

It's effectively

SELECT DISTINCT ON(x,y) z
FROM foo;
-- ORDER BY z;

In that it takes the first z. There are two important differences,

You can also select other columns at no cost of further aggregation..

SELECT DISTINCT ON(x,y) z, k, r, t, v
FROM foo;
-- ORDER BY z, k, r, t, v;

Because there is no GROUP BY you can not use (real) aggregates with it.

CREATE TABLE foo AS
SELECT * FROM ( VALUES
  (1,2,3),
  (1,2,4),
  (1,2,5)
) AS t(x,y,z);

SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;

-- fails, as you should expect.
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;

-- would not otherwise fail.
SELECT myFirstAgg(z), sum(z)
FROM foo
GROUP BY x,y;

Don't forget `ORDER BY`

Also, while I didn't bold it then I will now

Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

Always use an ORDER BY with DISTINCT ON

Using an Ordered-Set Aggregate Function

I imagine a lot of people are looking for first_value, Ordered-Set Aggregate Functions. Just wanted to throw that out there. It would look like this, if the function existed:

SELECT a, b, first_value() WITHIN GROUP (ORDER BY z)    
FROM foo
GROUP BY a,b;

But, alas you can do this.

SELECT a, b, percentile_disc(0) WITHIN GROUP (ORDER BY z)   
FROM foo
GROUP BY a,b;

MYSQL Select Distinct with SUM

Here's your query:

    SELECT 
        concat("fund_key: ",stock_key,", units: ",sum(units),", price: ",round(sum(units*price)/sum(units),2)) as result
    FROM PRUEBA.new_table2
    GROUP BY stock_key,user_key;

Result:

enter image description here

Best Answer

Related Solutions

PostgreSQL – Is There a Type-Safe first() Aggregate Function?

DISTINCT ON()

Don't forget ORDER BY

Using an Ordered-Set Aggregate Function

MYSQL Select Distinct with SUM

Related Question

`DISTINCT ON()`

Don't forget `ORDER BY`