Consider these simplified tables:
CREATE TABLE dbo.words
(
id bigint NOT NULL IDENTITY (1, 1),
word varchar(32) NOT NULL,
hits int NULL
)
CREATE TABLE dbo.items
(
id bigint NOT NULL IDENTITY (1, 1),
body varchar(256) NOT NULL,
)
The words
table holds about 9000 records, each holding a single word ('phone','sofa','house','dog', …)
The items
table holds around 12000 records, each with a body text of no more then 256 characters.
Now, I need to update the words
table, counting how many records there are in the items
table that hold (at least once) the text in the word field. I need to account for partial words, so all these 4 records should be counted for the word dog:
'This is my dog'
'I really like the movie dogma'
'my cousin has sheepdogs'
'dog dog dog doggerdy dog dog'
That last example should count as just one record (contains the term 'dog' at least once).
I can use this query:
UPDATE dbo.words
SET hits = (SELECT COUNT(*) FROM dbo.items WHERE body like '%' + word + '%')
But, this is extremely slow, this will take over 10 minutes to complete on the not to heavy server I have for it.
AFAIK indexes won't help, I'm doing LIKE searches.
I also think full-text won't help me, since I'm looking for words starting, ending or containing my search term. I could be wrong here.
Any advise on how to speed this up?
Best Answer
The best way I have found to speed up leading wildcard
LIKE
searches is to use n-grams. I describe the technique and provide a sample implementation in Trigram Wildcard String Search in SQL Server.It may be suitable for your needs, but be aware:
Test
I ran a quick test using the Complete Works of Shakespeare to populate the
body
column of theitems
table with 15,838 rows. I loaded thewords
table with 7,669 unique words from the same text.The trigram structures built in around 2 seconds and the following update statement completed in 5 seconds on my mid-range laptop:
A selection of the updated words table:
The modified trigram scripts from my article are below:
The only other change was to add a clustered index to the
items
table: