Postgresql – How to remove duplicates from any array and preserve ordering in PostgreSQL

arrayduplicationorder-bypostgresql

I'm looking for a way to eliminate duplicates from a PostgreSQL array
while preserving the ordering of its elements. What I currently
have are the following functions:

create function array_unique( anyarray ) 
  returns anyarray immutable strict language sql as $$
  select array( select distinct unnest( $1 ) ); $$;

create function array_unique_sorted( anyarray ) 
  returns anyarray immutable strict language sql as $$
  select array( select distinct unnest( $1 ) order by 1 ); $$;

/* ### TAINT there ought to be a simpler, declarative solution */
create function array_unique_stable( text[] )
  returns text[] immutable strict parallel safe language plpgsql as $$
  declare
    R         text[] = '{}';
    ¶element  text;
  begin
    foreach ¶element in array $1 loop
      if not array[ ¶element ] && R then
        R :=  R || array[ ¶element ];
        end if;
      end loop;
    return R; end; $$;

In the above, array_unique takes an array of any type and returns a copy
with all duplicates removed; their relative ordering is arbitrary.
array_unique_sorted is like array_unique, but the elements are sorted
relative to each other; this is sometimes useful as all arrays with the
same set of of distinct elements will compare equal after being
normalized by this function.

array_unique_stable already does what I'm looking for: given an array
(which in this exampe must be a text[] array), it scans elements from
left to right; whenever it encounters a previously unseen element, it adds
that one to the result. Thus, only the first occurrence of each value
is kept.

However, the implementation has some drawbacks: First, there seems to be no
way to write it so it accepts the pseudo-type anyarray.

Second, while
the first two functions are written in SQL, they may presumably be inlined,
array_unique_stable is written in PL/pgSQL, and so it cannot be inlined.

Third, it bugs me that I couldn't come up with a solution in pure SQL…

Best Answer

This can indeed be done using pure SQL:

create function array_unique_stable(p_input anyarray)
  returns anyarray immutable strict parallel safe 
  language sql
as 
$$
select array_agg(t order by x)
from (
  select distinct on (t) t,x
  from unnest(p_input) with ordinality as p(t,x)
  order by t,x
) t2;
$$

The unnest(p_input) with ordinality will return the original index of the element in the array which is then used to aggregate them back in the outer query.

select array_unique_stable(array['a','x','x','b']) as text_array, 
       array_unique_stable(array[10,1,1,5,8,8]) as int_array

returns

text_array | int_array 
-----------+-----------
{a,x,b}    | {10,1,5,8}

and