SQL Server Query Performance – Slow Query with ‘Contains’ and ‘=’ in WHERE Clause

full-text-searchsql serversql-server-2008-r2

The following query takes about 10 seconds to finish on a table with 12k records

select top (5) *
from "Physician"
where "id" = 1 or contains("lastName", '"a*"')

But if I change the where clause to either

where "id" = 1

or

where contains("lastName", '"a*"')

It'll return instantly.

Both columns are indexed and the lastName column is also full-text indexed.

CREATE TABLE Physician
(
   id         int identity    NOT NULL,
   firstName  nvarchar(100)   NOT NULL,
   lastName   nvarchar(100)   NOT NULL
);

ALTER TABLE Physician
  ADD CONSTRAINT Physician_PK
  PRIMARY KEY CLUSTERED (id);

CREATE NONCLUSTERED INDEX Physician_IX2
   ON Physician (firstName ASC);

CREATE NONCLUSTERED INDEX Physician_IX3
   ON Physician (lastName ASC);

CREATE FULLTEXT INDEX
    ON "Physician" ("firstName" LANGUAGE 0x0, "lastName" LANGUAGE 0x0)
    KEY INDEX "Physician_PK"
    ON "the_catalog"
    WITH stoplist = off;

Here is the Execution Plan

What could be the problem?

Best Answer

Your execution plan

When looking at the query plan, we can see that one index is touched to serve two filter operations.

enter image description here

Very simply put, due to the TOP operator, a row goal was set. Much more information & prerequisites on row goals can be found here

From that same source:

A row goal strategy generally means favouring non-blocking navigational operations (for example, nested loops joins, index seeks, and lookups) over blocking, set-based operations like sorting and hashing. This can be useful whenever the client can benefit from a quick start-up and steady stream of rows (with perhaps a longer overall execution time – see Rob Farley's post above). There are also the more obvious and traditional uses e.g. in presenting results a page at a time.

The entire table gets probed into the filters with the use of a left semi join that has a row goal set, hoping to return the 5 rows as fast and efficient as possible.

This does not happen, resulting in many iterations over the .Fulltextmatch TVF.

enter image description here


Recreating

Based on your plan, I was able to somewhat recreate your problem:

CREATE TABLE dbo.Person(id int not null,lastname varchar(max));

CREATE UNIQUE INDEX ui_id ON  dbo.Person(id)
CREATE FULLTEXT CATALOG ft AS DEFAULT;  
CREATE FULLTEXT INDEX ON dbo.Person(lastname)   
   KEY INDEX ui_id   
   WITH STOPLIST = SYSTEM;  
GO  

INSERT INTO dbo.Person(id,lastname)
SELECT top(12000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
REPLICATE(CAST('A' as nvarchar(max)),80000)+ CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) as varchar(10))
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2;
CREATE CLUSTERED INDEX cx_Id on dbo.Person(id);

Running the query

SELECT TOP (5) *
FROM dbo.Person
WHERE "id" = 1 OR contains("lastName", '"B*"');

Results into a query plan comparable to yours:

enter image description here

In the above example, B does not exist in the fulltext index. As a result it depends on the parameter & data how efficient the query plan can be.

A better explanation of this can be found in Row Goals, Part 2: Semi Joins by Paul White

...In other words, on each iteration of an apply, we can stop looking at input B as soon as the first match is found, using the pushed-down join predicate. This is exactly the sort of thing a row goal is good for: generating part of a plan optimized to return the first n matching rows quickly (where n = 1 here).

For example, changing the predicate so the results are found way sooner (at the beginning of the scan).

select top (5) *
from dbo.Person
where "id" = 124 
or contains("lastName", '"A*"');

enter image description here

the where "id" = 124 gets eliminated due to the fulltext index predicate already returning 5 rows, satisfying the TOP() predicate.

The results show this as well

id lastname 
1  'AAA...'   
2  'AAA...'
3  'AAA...'
4  'AAA...'
5  'AAA...'

And the TVF executions:

enter image description here

Inserting some new rows

INSERT INTO dbo.Person
SELECT 12001, REPLICATE(CAST('B' as nvarchar(max)),80000);
INSERT INTO dbo.Person
SELECT 12002, REPLICATE(CAST('B' as nvarchar(max)),80000);

Running the query to find these previous inserted rows

SELECT TOP (2) *
from dbo.Person
where "id" = 1
or contains("lastName", '"B*"');

This again results in too many iterations over almost all of the rows to return the last but one value found.

enter image description here

enter image description here

id   lastname
1     'AAA...'
12001 'BBB...'

Resolving

When removing the row goal by using traceflag 4138

SELECT TOP (5) *
FROM dbo.Person
WHERE "id" = 124 
OR contains("lastName", '"B*"')
OPTION(QUERYTRACEON 4138 );

The optimizer uses a join pattern closer to implementing a UNION, in our case this is favourable as it pushes the predicates down to their respective clustered index seeks, and does not use the row goaled left semi join operator.

enter image description here

Another way to write this, without using the above mentioned traceflag:

SELECT top (5) *
FROM
(
SELECT * 
FROM dbo.Person
WHERE "id" = 1 
UNION
SELECT * 
FROM dbo.Person
WHERE contains("lastName", '"B*"')
 ) as A;

With the resulting query plan:

enter image description here

where the fulltext function is applied directly

enter image description here

As a sidenote, for op, the query optimizer hotfix traceflag 4199 resolved his problem. He implemented this by adding OPTION(QUERYTRACEON(4199)) to the query. I was not able to reproduce that behaviour on my end. This hotfix does contain a semi join optimization:

Trace Flag: 4102 Function: SQL 9 - Query performance is slow if the execution plan of the query contains semi join operators Typically, semi join operators are generated when the query contains the IN keyword or the EXISTS keyword. Enable flag 4102 and 4118 to overcome this.

Source


Extra

During cost based optimization, the optimizer could also add an index spool to the execution plan, implemented by LogOp_Spool Index on fly Eager (or the physical counterpart)

It does this with my dataset for TOP(3) but not for TOP(2)

SELECT TOP (3) *
from dbo.Physician
where "id" = 1
or contains("lastName", '"B*"')  

enter image description here

On the first execution, an eager spool reads and stores the entire input before returning the subset of rows that is requested by the Predicate Later executions read and return the same or a different subset of rows from the worktable, without ever having to execute the child nodes again.

Source

With the seek predicate applied to this index eager spool:

enter image description here