Compare Words in String Without Considering Positions in PostgreSQL

postgresqlpostgresql-9.6string manipulation

In Postgres 9.6 I want to test whether two strings like these are considered the same:

'this is a test number 01', 'number this is 01 a test'

So I have created this function:

CREATE OR REPLACE FUNCTION sort_text(a text) RETURNS text AS $$
    declare t1 text;
    BEGIN
      select(array_to_string (
        array(
          select * from unnest(string_to_array(a, ' ')) order by 1), ' ')) into t1;
      RETURN t1;
    END;

$$ LANGUAGE plpgsql;

select (sort_text('this is a test number 01') = sort_text('number this is 01 a test'));

which actually looks to be working correctly.

I was wondering, is there any better way to do this?

'this and this' and 'and this' are considered to be different.

All the strings are already stripped out (spaces and punctuation) and duplication is not a problem. String length 50 characters as max estimation.

Best Answer

I suggest a single SELECT in a plain SQL function:

CREATE OR REPLACE FUNCTION strings_equivalent(a text, b text)
  RETURNS bool AS
$func$
   SELECT a1 = b1
   FROM  (
      SELECT string_agg(w, ' ') AS a1
      FROM  (
         SELECT w
         FROM   unnest(string_to_array(a, ' ')) w
         ORDER  BY w
         ) a1
      ) a2
   , (
      SELECT string_agg(w, ' ') AS b1
      FROM  (
         SELECT w
         FROM   unnest(string_to_array(b, ' ')) w
         ORDER  BY w
         ) b1
      ) b2
   WHERE  length(a) = length(b)
   UNION ALL
   SELECT FALSE
   LIMIT 1;  -- for clarity, not needed
$func$ LANGUAGE sql IMMUTABLE;

Call:

SELECT strings_equivalent('this is a test number 01', 'number this is 01 a test');

This is assuming:

Duplicate words count like any other words
Separator is a single space
No leading or trailing spaces

The function returns NULL for any empty string or NULL input.

The UNION ALL construct is a shortcut to return FALSE immediately if the input strings don't have the same length and avoid more expensive processing. Related:

LIMIT 1 is not needed because the function only returns the first column of the first row anyway and ignores the rest if there are more rows.

IMMUTABLE (since the result never changes for the same input) helps performance with repeated evaluation and allows indexes on functional expressions.

You could use regexp_split_to_table(a, ' ') instead of unnest(string_to_array(a, ' ')), but regular expression functions are typically more expensive. (You can cover more sophisticated separator characters with the regex, though, like '\s+' for any white space). Related:

How to preserve the original order of elements in an unnested array?

BTW, your simple function sort_text() looks good. But use string_agg() instead of array_to_string(ARRAY(...)) in a simple SQL function. No variable assignment needed:

CREATE OR REPLACE FUNCTION sort_text(a text)
  RETURNS text AS
$$
   SELECT string_agg(w, ' ')   
   FROM  (
      SELECT w
      FROM   unnest(string_to_array(a, ' ')) w
      ORDER  BY 1
      ) sub
$$ LANGUAGE sql IMMUTABLE;

Related Solutions

Postgresql – Get a Table from XML in postgresql

There is nothing wrong with using xpath_table().

Even though some of the functions in the xml2 extension have been replaced with in-core functions, xpath_table is not one of them (I think this will happen for Postgres 10).

Until then, xpath_table() seems to be your only option:

SELECT * 
FROM xpath_table('id',
            'documents',
            'xmldata',
            '//test',
            'true') AS t(doc_id integer, data text);

Returns:

doc_id | data    
-------+---------
     1 | Red text
     1 | green

In order for this to work you first need to create the extension (as a superuser):

create extension xml2;

Otherwise the function isn't available.

Postgresql – Selecting union of results of dataset-producing function when applied element-wise to an array

This can be solved via unnest or by modifying the original array-returning function to return a table/setof instead.

The function application can then be done within a join.

Best Answer

Related Solutions

Postgresql – Get a Table from XML in postgresql

Postgresql – Selecting union of results of dataset-producing function when applied element-wise to an array

Related Question