Sql-server – SQL Server Fulltext search against big amount of search terms over some period of time

full-text-searchsql-server-2008

I've got a task where I must do a fulltext search in SQL Server 2008 against big amount of search terms (30-60 search terms) combined with OR. Some search terms may have partial match (asterisks at the end). So the fulltext query is something like:

'"term one" OR "term two" OR "term*" OR .... OR "term last"'

The query is pretty straightforward, I just select primary keys (which are integers) into a temporary table (for further processing):

insert into @processed_rules(rule_id, story_id)
select @rule_id, Stories.id
from Stories
inner join CONTAINSTABLE(Stories, (header, body), @formatted_keywords) as s 
   on s.[KEY] > @oldest_id AND s.[KEY] = Stories.id

The main problems is a performance. The table contains several millions of records, each containing 2 fields which are subject to search: "header" with content of a size ~600-700 bytes and body with content of a size of several kilobytes.

The search shown above executes in ~1.5 seconds. I need to search against several search term sets in my stored procedure, so the overall time becomes something like ~10 seconds for me, which is unacceptable.

Despite the fact that the table contains records for past several years, I KNOW that I need results only from past 1-2 weeks maximum.

As you can see in the query I am trying to limit the output with some recent date/time range (s.[KEY] > @oldest_id where @oldest_id is a previously calculated record id for some moment in the past), but it doesn't seem to help much. Select TOP N also doesn't affect the execution time. As far as I understand that is because fulltext search first produces results for the whole table no matter what constraints are set in the query and only after that filters it against given constraints.

So I have several questions.

1) Is any way to restrict fulltext search only to recent weeks' records, in order to boost the search performance? I was thinking about partitioning table, indexed views and stuff like that but unfortunately didn't find in the Internet a recipe how to use them for my problem. Any advice in this direction?

2) The search for several search terms works pretty quickly for me (matter of several milliseconds), is it normal that when I give 30-60 search terms combined with OR the performance degrades so much?

P.S. The server has a lot of memory and disk space installed and it's multiprocessor, it has very high performance itself so it's hardly the bottleneck.

Best Answer

Execution plan XML would be useful here but as a shot in the dark:

  1. Daft as it may sound, try pushing either the [id] or date predicate into the CONTAINSTABLE query. See SQL Server 2005 Full-Text Queries on Large Catalogs: Lessons Learned - Consider embedding filter conditions as keywords in the indexed text.
  2. I don't remember where but I recall reading an article or blog post some time ago that flagged big OR full-text queries as problematic. IIRC the suggested hack/workaround was to issue a UNION query instead.

For your example, the UNION for 2) would be along the lines of:

INSERT
    @processed_rules
    (
    rule_id
    , story_id
    )
SELECT
    @rule_id
  , s.[key]
FROM
   CONTAINSTABLE(Stories, (header, body), '"term one"')

UNION

SELECT
    @rule_id
  , s.[key]
FROM
   CONTAINSTABLE(Stories, (header, body), '"term two"')

UNION

  SELECT
    @rule_id
  , s.[key]
FROM
   CONTAINSTABLE(Stories, (header, body), '"term three"')