Sql-server – SQL Server Full-text population deadlock

deadlockfull-text-searchsql server

I've created a full-text index on a non-persisted computed column in a table (details here).

However, every couple days, the population gets deadlocked and stopped. Specifically:

DESCRIPTION: A fatal error occurred during a full-text population and
caused the population to be cancelled. Population type is: AUTO;
database name is XYZ (id: 7); catalog name is UsersCatalog (id:
8); table name Users (id: 863342140). Fix the errors that are
logged in the full-text crawl log. Then, resume the population. The
basic Transact-SQL syntax for this is: ALTER FULLTEXT INDEX ON
table_name RESUME POPULATION.

The deadlock report is as follows:

<deadlock>
 <victim-list>
  <victimProcess id="process28fdc561088" />
 </victim-list>
 <process-list>
  <process id="process28fdc561088" taskpriority="20" logused="0" waitresource="PAGE: 7:1:6079614 " waittime="1171" ownerId="2020759813" transactionname="IFTSAutoNested" 
           lasttranstarted="2019-09-16T00:16:49.027" XDES="0x28f61f40408" lockMode="S" schedulerid="8" kpid="8000" status="background" spid="35" sbid="0" ecid="0" priority="0" trancount="3">
   <executionStack>
    <frame procname="adhoc" line="1" stmtstart="60" stmtend="814" sqlhandle="0x020000008bfb2728b1fbd70b14e785d0a8727ca1f478b5690000000000000000000000000000000000000000">
unknown    </frame>
   </executionStack>
   <inputbuf>
   </inputbuf>
  </process>
  <process id="process2966e0368c8" taskpriority="20" logused="20752" waitresource="PAGE: 7:1:6079661 " waittime="1171" ownerId="2020759849" transactionname="IFTSEndOfBatch" 
           lasttranstarted="2019-09-16T00:16:49.047" XDES="0x29230f2e408" lockMode="IX" schedulerid="4" kpid="12296" status="background" spid="33" sbid="0" ecid="0" priority="0" trancount="0">
   <executionStack />
   <inputbuf>
   </inputbuf>
  </process>
 </process-list>
 <resource-list>
  <pagelock fileid="1" pageid="6079614" dbid="7" subresource="FULL" objectname="XYZ.sys.fulltext_index_docidstatus_863342140" id="lock2897baa0d00" mode="IX" associatedObjectId="72057606237061120">
   <owner-list>
    <owner id="process2966e0368c8" mode="IX" />
   </owner-list>
   <waiter-list>
    <waiter id="process28fdc561088" mode="S" requestType="wait" />
   </waiter-list>
  </pagelock>
  <pagelock fileid="1" pageid="6079661" dbid="7" subresource="FULL" objectname="XYZ.sys.fulltext_index_docidstatus_863342140" id="lock294a6477a00" mode="SIU" associatedObjectId="72057606237061120">
   <owner-list>
    <owner id="process28fdc561088" mode="S" />
   </owner-list>
   <waiter-list>
    <waiter id="process2966e0368c8" mode="IX" requestType="convert" />
   </waiter-list>
  </pagelock>
 </resource-list>
</deadlock>

In other cases it's deadlocking with a bulk process that copies data from one table to that one. Something like this:

-- Select a batch
UPDATE  TOP (@BatchSize)
        [import].[TempTable]
SET     BatchStatusFlag = 1
OUTPUT  inserted.Id, inserted.ServerId  INTO @UpdatedIds (SourceId, TargetId)
WHERE   BatchStatusFlag = 0

-- Move/update info for that batch
UPDATE  u
SET     ...
FROM    dbo.Users u
INNER JOIN [import].[TempTable] s ON s.Id = u.Id
INNER JOIN @UpdatedIds ids ON ids.Id = u.Id

In that case, the second UPDATE (the one that updates dbo.Users) is the one that deadlocks with the full-text service.

2 questions:

  • Why is it deadlocking/how can I avoid it?
  • When that happens, does it retry and reindexes those changes anyway? (even if with a longer delay)

SQL Server version: Microsoft SQL Server 2016 (RTM-GDR) (KB4019088) – 13.0.1742.0 (X64) Jul 5 2017 23:41:17 Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows Server 2016 Datacenter 6.3 (Build 14393: ) (Hypervisor)

Best Answer

Lets answer question 2 first. As you see in your first error message, it tells you to issue an:

ALTER FULLTEXT INDEX ON <table_name> RESUME POPULATION

If the reindex started automatically, this would not be needed.

Now, as to the first issue, let's work this through.

The Fulltext index build needs to ensure that the text of the fields used in the nonpersisted computed column does not get updated while the index is being rebuilt. Otherwise, the index would be corrupted. This will probably be done by holding shared page locks on the data while the persisted data is being built, then by holding exclusive locks on the data while the index is being updated.

On the other hand, the Bulk Copy process is attempting to add rows, which will update pages in the table. Since the process is Atomic (e.g. will ensure that all of the data is inserted correctly before releasing the implicit transaction), the process will check for the correct pages to insert (possibly pseudo-random sequence) data into, then lock the page to insert that data, and hold the lock until all of the data is in place. This can result in the deadlocks you are experiencing.

For the bulk update process, while updating, the JOINs will be executed, using Shared locks to identify the rows to be updated, then exclusive locks (on either the rows or the pages, depending) will be taken to perform the update. Again, if the sequence of rows/pages is pseudo-random compared to the sequence of pages checked by the FT index rebuild, some pages could be locked that the rebuild needs, while others could be locked by the rebuild that the update needs.

How could this be fixed? That is a more difficult question. It would be less likely to happen if you took out the overhead of calculating the non-persisted data in the index build (by persisting the field), but this would expand your data usage. You could also implement some form of a flag that the other processes check, turning it on before starting the rebuild and off after the rebuild completes, that would wait those processes until the rebuild is done.

There may be other possible solutions, but those are all I can think of at this moment.