If I have a string in a column on a row in a table like this
1 2 2 2 2 2 2
How would I count the occurrence of a substring 2
inside the string. Assume nothing other than a space-delimiter of " "
.
For the purposes of this, let's treat the numbers as substrings
Sample data
CREATE TABLE foo
AS
SELECT 1 AS id, '1 2 2 2 2 2 2'::text AS data;
TABLE foo
id | data
----+---------------
1 | 1 2 2 2 2 2 2
Best Answer
You can solve this with a
FAST A string function, such as one the pattern explained below
Possible solutions
String
Using
length
andregexp_replace
Most RDBMS's provide some method to calculate substring occurrences like this,
This method isn't applicable here because without an anchor, we can't be sure if we're replacing the substring inside of something-space-delimited. As an example, the above replaces the
2
in329
. We can remedy that by usingregexp_replace
to anchor the substring.Because we're not splitting on simple spaces (' ') though we could with more complexity, we may also want to accommodate sub-strings of different lengths like in this question. This is why we explicitly include
/ length('2')
. That reduces to a no-op, but if we we're search for something that was longer than one character, it'd be required.Using ARRAY[]
Splitting into an
ARRAY[]
Here we have to subtract one match splits a string into two fragments, and thus the occurrences is one less than the fragment counts: this
xyx
split ony
, produces{'x', 'x'}
and we want the length to be1
corresponding to the occurrences ofy
.Alternatively, we can use
string_to_array
to separate something that's space-delimited and then count the matches,Using a TABLE
Splitting into a table with
regexp_split_to_table
Here we split the regex into a table.. In this method you're using
GROUP BY
andcount()
.And, from there you can run regular SQL on it.
Using
regex_matches
Here we get away from splitting and instead use the
\m
, and\M
anchors for word-boundaries.Procedural Languages
Perl
This method turned out to the be the fastest overall,
Summary and Performance Implications
Following the same format of data, performance implications can be obtained with
Under these constraints, I found that the procedural method with plperl was the fastest. Next I found the following as being the fastest native method,
Keep in mind the tried and true method of string replacement outside of needing to anchor the string is still the fastest and most efficient native method however clunky it may be,
That said, the
ARRAY[]
method is massively faster than splitting to a table.