The following query takes about 10 seconds to finish on a table with 12k records
select top (5) *
from "Physician"
where "id" = 1 or contains("lastName", '"a*"')
But if I change the where clause to either
where "id" = 1
or
where contains("lastName", '"a*"')
It'll return instantly.
Both columns are indexed and the lastName column is also full-text indexed.
CREATE TABLE Physician
(
id int identity NOT NULL,
firstName nvarchar(100) NOT NULL,
lastName nvarchar(100) NOT NULL
);
ALTER TABLE Physician
ADD CONSTRAINT Physician_PK
PRIMARY KEY CLUSTERED (id);
CREATE NONCLUSTERED INDEX Physician_IX2
ON Physician (firstName ASC);
CREATE NONCLUSTERED INDEX Physician_IX3
ON Physician (lastName ASC);
CREATE FULLTEXT INDEX
ON "Physician" ("firstName" LANGUAGE 0x0, "lastName" LANGUAGE 0x0)
KEY INDEX "Physician_PK"
ON "the_catalog"
WITH stoplist = off;
Here is the Execution Plan
What could be the problem?
Best Answer
Your execution plan
When looking at the query plan, we can see that one index is touched to serve two filter operations.
Very simply put, due to the TOP operator, a row goal was set. Much more information & prerequisites on row goals can be found here
From that same source:
The entire table gets probed into the filters with the use of a left semi join that has a row goal set, hoping to return the 5 rows as fast and efficient as possible.
This does not happen, resulting in many iterations over the .Fulltextmatch TVF.
Recreating
Based on your plan, I was able to somewhat recreate your problem:
Running the query
Results into a query plan comparable to yours:
In the above example, B does not exist in the fulltext index. As a result it depends on the parameter & data how efficient the query plan can be.
A better explanation of this can be found in Row Goals, Part 2: Semi Joins by Paul White
For example, changing the predicate so the results are found way sooner (at the beginning of the scan).
the
where "id" = 124
gets eliminated due to the fulltext index predicate already returning 5 rows, satisfying theTOP()
predicate.The results show this as well
And the TVF executions:
Inserting some new rows
Running the query to find these previous inserted rows
This again results in too many iterations over almost all of the rows to return the last but one value found.
Resolving
When removing the row goal by using traceflag 4138
The optimizer uses a join pattern closer to implementing a
UNION
, in our case this is favourable as it pushes the predicates down to their respective clustered index seeks, and does not use the row goaled left semi join operator.Another way to write this, without using the above mentioned traceflag:
With the resulting query plan:
where the fulltext function is applied directly
As a sidenote, for op, the query optimizer hotfix traceflag 4199 resolved his problem. He implemented this by adding
OPTION(QUERYTRACEON(4199))
to the query. I was not able to reproduce that behaviour on my end. This hotfix does contain a semi join optimization:Source
Extra
During cost based optimization, the optimizer could also add an index spool to the execution plan, implemented by
LogOp_Spool Index on fly Eager
(or the physical counterpart)It does this with my dataset for
TOP(3)
but not forTOP(2)
Source
With the seek predicate applied to this index eager spool: