PostgreSQL – Select All Substrings (n-grams) of Length n

postgresql

How could I select all n-grams, ie. substrings of length n from a string using SQL? For example, the 3-grams of string example are exa, xam, amp, mpl, ple.

I'm using PostgreSQL to be more precise.

Best Answer

This does the trick:

SELECT SUBSTRING('example' FROM n FOR 3) 
FROM GENERATE_SERIES(1, LENGTH( 'example' )-2, 1) n;
exa
xam
amp
mpl
ple

Here it is in a function:

CREATE OR REPLACE FUNCTION ngrams(varchar, integer) RETURNS SETOF TEXT AS $$
SELECT SUBSTRING($1 FROM n FOR $2)::TEXT
FROM GENERATE_SERIES(1, LENGTH($1)-($2-1), 1) n;
$$ LANGUAGE SQL;

Use it like:

SELECT ngrams('example', 3)