Sql-server – Autocomplete too slow: possible optimizations

performancequery-performancesql serversql-server-2008-r2string-searching

The autocomplete search functionality of my website searches a varchar field which contains the model number of sale items.

This field can contains strings varying from 1 to 75 characters and the table contains 400 000 rows. I have come up with a query that searchs only from the start of the string and that takes about 150-250ms to execute which is acceptable but now my manager wants the query to search for any substrings which makes the query 3-10 times slower (around 1000-2000ms).

I have built a JS fiddle to give you an example of what the data looks like and what the two query are like.

http://sqlfiddle.com/#!6/9efa3/2/0

There already are a few indexes on the table. What would be the best practice to speed-up this auto-complete search field ?
(database version is SQLSERVER 2008R2)

Here is a brief example of the data I'm dealing with :

CREATE TABLE [Products](
    [productid] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
    [model] [nvarchar](75) NOT NULL 
 CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED 
(
    [productid] ASC
));

insert into products values ('UMPX1AA0011 danish e-315 woot');
insert into products values ('P27y719VC');
insert into products values ('VG2y439m-LED');
insert into products values ('UMUyX165AAB01');
insert into products values ('U28y79VF');
insert into products values ('U52417HJ');
insert into products values ('VA25746M-LED WITH FLYING CORNERS');
insert into products values ('S19F350HNN 1pc california storage');
insert into products values ('VA211917A');
insert into products values ('PM2500X2');
insert into products values ('E22470SWHE');
insert into products values ('V22465WLYDP');
insert into products values ('I129LMH1HKC');
insert into products values ('OM5EN X 35 the new version');
insert into products values ('DLS3060WDB');
insert into products values ('PVW');
insert into products values ('LI23721S');
insert into products values ('V173516LBM');
insert into products values ('VX2376-SMHD-A');
insert into products values ('GUM5FX1AA1001');
insert into products values ('GPM300X11');
insert into products values ('GUM-WH6AA002');
insert into products values ('2435V5LSB');
insert into products values ('P2418HZ');
insert into products values ('Stylish sectional one of a kind y-5151');

and these are the two queries I'm comparing

--runs acceptably fast, about 100-250ms
select * from products where model like 'y-5151'+'%';
--takes too long, around 1000-2500ms
select * from products where model like '%' + 'y-5151' +'%'

Best Answer

The solution I went with is to build a "half-triangram" table which contains a pre-processsed version of all the model # substrings as suggested by Aaron Bertrand in the two following blog posts of his :

https://sqlperformance.com/2017/02/sql-indexes/seek-leading-wildcard-sql-server

https://sqlperformance.com/2017/02/sql-performance/follow-up-1-leading-wildcard-seeks

The solution for me was to create a new table where I look for models starting with the search string. Each model is listed as such, let's say the product_ID was 51 and the model was 7500 Twin Bed

ID|Model
----------------
51|7500 Twin Bed
51|500 Twin Bed
51|00 Twin Bed
51|0 Twin Bed
51| Twin Bed
51|Twin Bed
51|win Bed
51|in Bed
51|n Bed
51| Bed
51|Bed
51|ed
51|d

This way you never need to do a full wildcard search, a simple select distinct id_product from products_dictionary where model like 'Twin%' will return the required results. The query now takes less than 100ms.

Here is the code I used to create the table and populate it. Again, this is all described properly on Aaron's blog post:

CREATE TABLE [dbo].[products_dictionary](
    [Id_product] [int],
    [model] [nvarchar](75) NOT NULL
)
insert into [products_dictionary] 
select p.id_product,f.fragment from products p 
cross apply dbo.CreateStringFragments(p.model) AS f;

create clustered index index_idprod_substrmodel on [Products_dictionary]([model],[Id_product])