Postgresql – Strip longest common suffix and concat in Postgres

aggregatepostgresqlpostgresql-9.1string manipulationwindow functions

Given a table t:

id | name
------------
1  | abcfug
1  | deffug
1  | hijfug
2  | etc

How can I do something like:

select string_agg(strip_lcs(name), ', ') from t where id = 1

returning:

abc, def, hij

NB I wrote an aggregate function to return lcs if that helps:

CREATE FUNCTION lcs_iterate(_state TEXT, _value TEXT)
RETURNS TEXT
AS
$$
        SELECT  RIGHT($2, s - 1)
        FROM    generate_series(1, LEAST(LENGTH($1), LENGTH($2))) s
        WHERE   RIGHT($1, s) <> RIGHT($2, s)
        UNION ALL
        SELECT  LEAST($1, $2)
        LIMIT 1;
$$
LANGUAGE 'sql';

CREATE AGGREGATE lcs(TEXT) (SFUNC = lcs_iterate, STYPE = TEXT);

Best Answer

Your aggregate function is smart and fast, but there is a bug. If one string matches the tail of another completely, the UNION ALL part kicks in to return LEAST($1, $2). That must instead be something like CASE WHEN length($1) > length($2) THEN $2 ELSE $1 END. Test with 'match' and 'aa_match'. (See fiddle below.)

Plus, make the function IMMUTABLE STRICT:

CREATE OR REPLACE FUNCTION lcs_iterate(_state text, _value text)
  RETURNS text AS
$func$
SELECT  right($2, s - 1)
FROM    generate_series(1, least(length($1), length($2))) s
WHERE   right($1, s) <> right($2, s)

UNION   ALL
SELECT  CASE WHEN length($1) > length($2) THEN $2 ELSE $1 END  -- !
LIMIT  1;
$func$ LANGUAGE sql IMMUTABLE STRICT;  -- !

NULL values are ignored and empty strings lead to zero-length common suffix. You may want to treat these special cases differently ...

While we only need the length of the common suffix, a very simple FINALFUNC returns just that:

CREATE AGGREGATE lcs_len(text) (
   SFUNC = lcs_iterate
 , STYPE = text
 , FINALFUNC = length()  -- !
   );

Then your query can look like:

SELECT string_agg(trunc, ', ') AS truncated_names
FROM  (
   SELECT left(name, -lcs_len(name) OVER ()) AS trunc
   FROM   tbl
   WHERE  id = 1
   ) sub;

.. using the custom aggregate as window function.

db<>fiddle here

I also tested with Postgres 9.4, and it should work with your outdated Postgres 9.1, but that's too old for me to test. Consider upgrading to a current version.

Related: