In Postgres 9.6 I want to test whether two strings like these are considered the same:
'this is a test number 01'
, 'number this is 01 a test'
So I have created this function:
CREATE OR REPLACE FUNCTION sort_text(a text) RETURNS text AS $$
declare t1 text;
BEGIN
select(array_to_string (
array(
select * from unnest(string_to_array(a, ' ')) order by 1), ' ')) into t1;
RETURN t1;
END;
$$ LANGUAGE plpgsql;
select (sort_text('this is a test number 01') = sort_text('number this is 01 a test'));
which actually looks to be working correctly.
I was wondering, is there any better way to do this?
'this and this'
and 'and this'
are considered to be different.
All the strings are already stripped out (spaces and punctuation) and duplication is not a problem. String length 50 characters as max estimation.
Best Answer
I suggest a single SELECT in a plain SQL function:
Call:
This is assuming:
The function returns NULL for any empty string or NULL input.
The
UNION ALL
construct is a shortcut to returnFALSE
immediately if the input strings don't have the same length and avoid more expensive processing. Related:LIMIT 1
is not needed because the function only returns the first column of the first row anyway and ignores the rest if there are more rows.IMMUTABLE
(since the result never changes for the same input) helps performance with repeated evaluation and allows indexes on functional expressions.You could use
regexp_split_to_table(a, ' ')
instead ofunnest(string_to_array(a, ' '))
, but regular expression functions are typically more expensive. (You can cover more sophisticated separator characters with the regex, though, like'\s+'
for any white space). Related:BTW, your simple function
sort_text()
looks good. But usestring_agg()
instead ofarray_to_string(ARRAY(...))
in a simple SQL function. No variable assignment needed: