Sql-server – Row estimates always too low

execution-plansql serversql-server-2016

I have a query which involves a full-text search like this:

SELECT TOP 30 PersonId,
              PersonParentId,
              PersonName,
              PersonPostCode
FROM dbo.People
WHERE PersonDeletionDate IS NULL
      AND PersonCustomerId = 24
      AND CONTAINS(ContactFullText, '"mr" AND "ch*"')
      AND PersonGroupId IN(197, 206, 186, 198)
ORDER BY PersonParentId,
         PersonName;

This generates two main plans, one is very fast in all cases, the other is very slow in most cases.

I have experimented with this query such that the FT search is not included and what I found is that the row estimates are always way lower than they should be.

If I run update statistics...with fullscan I still see extremely inaccurate row estimates from NC index seek operations in the execution plan.

When the row estimates are low enough, a loop join is selected, which is normally very slow (30+ seconds). Higher estimates seem to produce a good plan involving a merge join instead of a loop join.

Why is SQL Server still not estimating the rowcounts despite still having up to date statistics?

The plan: https://www.brentozar.com/pastetheplan/?id=rkXtE0jzX

When I remove the CONTAINS part, thereby omitting the FullText search, the query is fast, but the row estimate for the index seek is still 1 estimated, 2195 actual.

On @Kin's advice, I used CONTAINSTABLE, which ran instantly and produced the following plan: https://www.brentozar.com/pastetheplan/?id=S1hKainzQ
Interesting that there is no Full Text search operator.

Containstable requires RANK to produce the same result set in this case I've used AND RANK > 0 in the WHERE to produce the results I want, which produces this plan: https://www.brentozar.com/pastetheplan/?id=B1U7AA2zm

My only question now is about why row estimates are still inaccurate but I care less now that my FT queries seem significantly faster and more reliable. Very pleased!
https://www.brentozar.com/pastetheplan/?id=B1U7AA2zm

@EvanCarroll stats histogram here: https://pastebin.com/p7s0NvX5

Some follow up info – before/after execution plans for some typical FT search queries for the application being supported

A

  1. Before: https://www.brentozar.com/pastetheplan/?id=SJlAAAN7X (5 seconds)
  2. After: https://www.brentozar.com/pastetheplan/?id=H1ltkkSmm (<1
    second)

B

  1. Before: https://www.brentozar.com/pastetheplan/?id=Sy-gxJBQm (40
    seconds)
  2. After: https://www.brentozar.com/pastetheplan/?id=Sy2VxJrm7 (1
    second)

C

  1. Before: https://www.brentozar.com/pastetheplan/?id=r1z5e1rQ7 (2
    seconds)
  2. After: https://www.brentozar.com/pastetheplan/?id=r1oplkSQm (<1
    second)

D

  1. Before: https://www.brentozar.com/pastetheplan/?id=B1kHf1BQQ (2
    minutes 20 seconds)
  2. After: https://www.brentozar.com/pastetheplan/?id=r1D5z1SQm (11
    seconds)

Best Answer

(summarizing my comments and putting as answer)

A query rewrite will solve the issue of getting low row estimates. As Joe Chang explains in his blog post Query Optimizer Gone Wild - Full-Text

CONTAINS is "a predicte used in a WHERE clause" per Microsoft documentation, while CONTAINSTABLE acts as a table.

You get a much better plan (merge join) using CONTAINSTABLE vs the actual plan using contains uses a nested loop join with low row estimates.

You can rewrite the query as :

SELECT TOP 30 p.PersonId,
              p.PersonParentId,
              p.PersonName,
              p.PersonPostCode
FROM dbo.People p
left join containstable (ContactFullText, '"mr" AND "ch*"') cf on cf.[yourKey] = p.PersonId
WHERE p.PersonDeletionDate IS NULL
      AND p.PersonCustomerId = 24
      --AND CONTAINS(ContactFullText, '"mr" AND "ch*"')
      AND p.PersonGroupId IN(197, 206, 186, 198)
      AND [RANK] > 0
ORDER BY p.PersonParentId,
         p.PersonName;