Sql-server – Can SQL Server full-text search tokenize words within words

full-text-searchsql serversql-server-2012

Suppose I have the following schema and data in SQL Server 2012:

create table Exclamations
(
    ID int not null primary key identity(1,1),
    Exclamation nvarchar(150) not null
);

insert into Exclamations(Exclamation) values
    ('Ohmygosh!'),
    ('Ohmygoodness!'),
    ('It''sabird!'),
    ('It''saplane!'),
    ('Whattheheck!'),
    ('Lookoutbelow!'),
    ('Whatintheworld?'),
    ('Isheforreal?'),
    ('Takeahike!');

create fulltext index on Exclamations(Exclamation) key index ID;

And suppose I want to do full-text queries against my exclamations, but I want to be able to query for words within words using a high speed index. Is that possible? Perhaps with a custom dictionary, or custom word-breaking/boundary behavior?

For example, I want to be able to query for the word "out" and have it quickly locate row #6 ('Lookoutbelow!') without having to scan through all the rows using a LIKE-ish operator. This also implies some false positives, for example, suppose I want to query for the word "he" and find row #8 ('Isheforreal?'); the query would also return row #5 ('Whattheheck!') since it contains the same "he" substring. That's OK.

Note that I do not need the tokenizer to index every possible combination of characters. I don't mind if nonsense words like 'sapl' within row #4 are not indexed. I only need it to tokenize the words it already recognizes (and possibly some additional words that I would add, but that is only a nice-to-have.)

It is important to specify that for the purposes of this question I cannot pre-parse the exclamations to insert obvious tokenizing boundaries such as spaces, hyphens, or periods, before inserting into SQL Server. I want to know if SQL Server can somehow do that itself.

EDIT: After some more reading, it appears that what I want is generally called N-gram tokenizing. But I want it limited to tokens that are already present as, or variants from, words in the dictionary.

Best Answer

Looks like there is no such feature in SQL Server.