Sql-server – Autocomplete too slow: possible optimizations

performancequery-performancesql serversql-server-2008-r2string-searching

The autocomplete search functionality of my website searches a varchar field which contains the model number of sale items.

This field can contains strings varying from 1 to 75 characters and the table contains 400 000 rows. I have come up with a query that searchs only from the start of the string and that takes about 150-250ms to execute which is acceptable but now my manager wants the query to search for any substrings which makes the query 3-10 times slower (around 1000-2000ms).

I have built a JS fiddle to give you an example of what the data looks like and what the two query are like.

http://sqlfiddle.com/#!6/9efa3/2/0

There already are a few indexes on the table. What would be the best practice to speed-up this auto-complete search field ?
(database version is SQLSERVER 2008R2)

Here is a brief example of the data I'm dealing with :

CREATE TABLE [Products](
    [productid] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
    [model] [nvarchar](75) NOT NULL 
 CONSTRAINT [PK_Products] PRIMARY KEY CLUSTERED 
(
    [productid] ASC
));

insert into products values ('UMPX1AA0011 danish e-315 woot');
insert into products values ('P27y719VC');
insert into products values ('VG2y439m-LED');
insert into products values ('UMUyX165AAB01');
insert into products values ('U28y79VF');
insert into products values ('U52417HJ');
insert into products values ('VA25746M-LED WITH FLYING CORNERS');
insert into products values ('S19F350HNN 1pc california storage');
insert into products values ('VA211917A');
insert into products values ('PM2500X2');
insert into products values ('E22470SWHE');
insert into products values ('V22465WLYDP');
insert into products values ('I129LMH1HKC');
insert into products values ('OM5EN X 35 the new version');
insert into products values ('DLS3060WDB');
insert into products values ('PVW');
insert into products values ('LI23721S');
insert into products values ('V173516LBM');
insert into products values ('VX2376-SMHD-A');
insert into products values ('GUM5FX1AA1001');
insert into products values ('GPM300X11');
insert into products values ('GUM-WH6AA002');
insert into products values ('2435V5LSB');
insert into products values ('P2418HZ');
insert into products values ('Stylish sectional one of a kind y-5151');

and these are the two queries I'm comparing

--runs acceptably fast, about 100-250ms
select * from products where model like 'y-5151'+'%';
--takes too long, around 1000-2500ms
select * from products where model like '%' + 'y-5151' +'%'

Best Answer

The solution I went with is to build a "half-triangram" table which contains a pre-processsed version of all the model # substrings as suggested by Aaron Bertrand in the two following blog posts of his :

https://sqlperformance.com/2017/02/sql-indexes/seek-leading-wildcard-sql-server

https://sqlperformance.com/2017/02/sql-performance/follow-up-1-leading-wildcard-seeks

The solution for me was to create a new table where I look for models starting with the search string. Each model is listed as such, let's say the product_ID was 51 and the model was 7500 Twin Bed

ID|Model
----------------
51|7500 Twin Bed
51|500 Twin Bed
51|00 Twin Bed
51|0 Twin Bed
51| Twin Bed
51|Twin Bed
51|win Bed
51|in Bed
51|n Bed
51| Bed
51|Bed
51|ed
51|d

This way you never need to do a full wildcard search, a simple select distinct id_product from products_dictionary where model like 'Twin%' will return the required results. The query now takes less than 100ms.

Here is the code I used to create the table and populate it. Again, this is all described properly on Aaron's blog post:

CREATE TABLE [dbo].[products_dictionary](
    [Id_product] [int],
    [model] [nvarchar](75) NOT NULL
)
insert into [products_dictionary] 
select p.id_product,f.fragment from products p 
cross apply dbo.CreateStringFragments(p.model) AS f;

create clustered index index_idprod_substrmodel on [Products_dictionary]([model],[Id_product])

Related Solutions

Mysql – Slow performance of MySQL Join Query

Please provide SHOW CREATE TABLE; the explain is useless without it.

OR is a performance killer in many contexts.

( p.pricelist = "name_abc" AND p.iln = "sellerID_123" ) OR ( p.pricelist = "name_def" AND p.iln = "sellerID_456" ) OR ...

Turn that into

JOIN ( SELECT id FROM p WHERE 
( p.pricelist = "name_abc" AND p.iln = "sellerID_123" ) OR 
( p.pricelist = "name_def" AND p.iln = "sellerID_456" ) OR ... ) x ON x.id = foo.id

Also needed (on p):

INDEX(pricelist, iln, id)

(With the CREATEs, I could be more specific.)

The idea behind this "trick" is to move the costly work of the OR into a subquery that returns the necessary ids. Plus the INDEX makes it so that it can do all that work in the INDEX.

Sql-server – Optimizations Possible – SQLServer

First of all be aware though that when you are querying on tables that large you have to expect it to take some time.

That being said assuming you are using SQL 2008 or later you could try a filtered index on CLLab, however that makes several assumptions.

The number of rows where the Dept = "Lab" and Observe_Id = "Reviewed" is a relatively small percentage of the total number of rows.
This is an important enough query to warrant the extra write time to CLLAB for an additional index just for it.

If the answers to both of those are Yes (and of course SQL 2008 or higher) then try this.

CREATE NONCLUSTERED INDEX "IX_CLLAB_REVIEWED_LABS"
ON CLLAB (SSNO, COMPANY, DATEBACK)
WHERE DEPT = "LAB"
  AND OBSERVE_ID = "REVIEWED";

Really I would rather get rid of the subquery all together if possible but that would require more knowledge of the data and what you are trying to get out of the query.

Last but not least if you comment out the small subquery in the field list does it run significantly faster? If so that part of the query could be incorporated into the main query as a LEFT OUTER JOIN and might save you some time.

It would look like this.

-- In the field list
FACILITY = CASE WHEN B.FACILITY = '' THEN D.FACILITY ELSE B.FACILITY END

-- The FROM & WHERE Clauses
FROM CLLAB A
JOIN CLORDER B
    ON A.COMPANY = B.COMPANY
    AND A.ORDER_ID = B.ID
JOIN CLMASTER C
    ON B.ACCOUNT = C.ACCOUNT
LEFT OUTER JOIN CLDoctor D
    ON  B.PROVCODE = D.DOCTOR  
    AND B.FACILITY = ''          -- optionally. May be more efficient
                                 -- with different indexing
WHERE B.DEPT = 'LAB'
  AND A.DATEBACK NOT IN (.....)

Best Answer

Related Solutions

Mysql – Slow performance of MySQL Join Query

Sql-server – Optimizations Possible – SQLServer

Related Question