SQL Server – SQLCLR Assembly Error with Multiple Simultaneous Queries

csql serversql-clrsql-server-2012

I have been using SimMetrics to implement the QGramsDistance search in SQL Server. The SimMetrics is used as a SQL CLR assembly. The search results are promising and everything was well, until of late, the database has grown and the number of records have crossed a few hundred thousand with multiple users performing searches. Now, I encounter this error.

Msg 6522, Level 16, State 2, Procedure SchoolNameAutoCompletion, Line 42
A .NET Framework error occurred during execution of user-defined routine or aggregate "QGramsDistance": 
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
System.InvalidOperationException: 
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
   at System.Collections.Generic.List`1.Enumerator.MoveNext()
   at SimMetricsMetricUtilities.QGramsDistance.GetActualSimilarity(Collection`1 firstTokens, Collection`1 secondTokens)
   at SimMetricsMetricUtilities.QGramsDistance.GetSimilarity(String firstWord, String secondWord)
   at MyAssembly.DbTextFunctions.StringMetrics.QGramsDistance(SqlString firstWord, SqlString secondWord)

I have tried using the latest version of the SimMetrics in SQL Server 2012 using the .NET framework v4.0.30319 but has not made a difference. It's pretty clear that this error occurs only when more than a single search is performed.
Has anyone come across such an issue and have it resolved?

Best Answer

The type of error that you are getting (i.e. Collection was modified;) is most likely caused by sharing the collection across multiple threads. If the variable holding the collection is defined as static, then that variable is not thread-safe since all sessions share the same App Domain for all Assemblies in a particular Database that are owned by the same User.

And, judging by the class name of StringMetrics, the namespace of SimMetricsMetricUtilities, and the input parameter names, I am guessing that you are using the SQLCLR port recommended in the "News" section of the SimMetrics repository (please note: I am being intentionally vague in not linking to this "News" page or mentioning the title of the blog post containing this particular SQLCLR port since, in its current form, it should not be used due to incorrect usage of the SQLCLR API in a few areas). The main problem with that code (given just a quick review) is that it instantiates static classes for each algorithm. Hence you are sharing those collections across sessions, and even with the algorithm being quick, you are now calling the methods frequently enough to have those sessions (i.e. threads) stepping on each other. Lucky for you that Generic Lists throw these errors.

You might be able to fix this by:

  1. getting rid of the static class constructor
  2. getting rid of the static readonly variables
  3. adding the following line just above the return in the QGramsDistance method in the StringMetrics class:

    QGramsDistance _QGramsDistance = new QGramsDistance();
    

    Yes, it needs to be instantiated with each execution of the SQLCLR UDF.

  4. In the SqlFunction attribute, change IsPrecise to be false. Using the FLOAT datatype is specifically imprecise ;-).

Changes #1 and #2 only need to be done once to cover the functions/methods for all of the algorithms. Change #3 should be done across the rest of them using the class name that is appropriate for each particular method. And Change #4 should be done across all of those SqlFunction methods, and is the exact same change across all of them.