Sql-server – How to make Indexed Views work for SQLCLR

materialized-viewsql serversql-clr

In SQL Server, the indexed view is a hellscape of limitations. But I need one. I have a formatter SQLCLR function that creates a prettified version of a domain key – users want to be able to search on substrings of that prettified version. So I need a full-text index on a persisted calculated column or a materialized view.

However, the formatter depends on data stored in multiple tables.

So this works fine for a view, but not for a persisted computed column because those can't query from multiple tables.

My SQLCLR method is precise and deterministic, so it should be eligible for use in indexed views, but indexed views can't have SQLCLR in their index keys.

I could re-implement my formatter using the T-SQL FORMAT function… but FORMAT is also incompatible with indexed views.

Can indexed views do anything? Ever?

I'm open to the "roll your own materialized view with triggers" approach if anybody can refer me to a good document on best practices for doing that. Last time I tried, it spiraled out of control and there was tremendous duplication between the code for insert and update and the initialization.

Is there some way I'm missing to get performant text searches on calculated data on a few million rows, without using a materialized/indexed view or a persisted calculated column?

The formatter does not perform data access. However, the data I need to feed into the formatter for it to be useful will be coming from multiple tables (with a nice regular join), hence I can't solve this problem with persisted computed columns. I can't key the SQLCLR column, so I can't use it in a fulltext index.

I thought this would be simple. Persisted columns and indexed views are designed to perform calculations on write, and to properly implement the observer pattern so that changes to their dependencies are reflected on the calculated value.

Best Answer

Can indexed views do anything? Ever?

It actually is kind of miserable when you think of an indexed view as a solution, only to find your use case hits on one of the many limitations of indexed views (I'm looking at you, LEFT JOIN).

You're right that you can't include the SQLCLR in the index keys of the indexed view. However, you can include it in the SELECT list of the view definition, which will persist the value to disk. So you can at least avoid the cost of calculating the value on the fly when reading from the table.

In the AdventureWorks2014 sample database, I created this view on the delightfully named Person.Person table:

CREATE OR ALTER VIEW dbo.PersonWithHashForSomeReason
WITH SCHEMABINDING
AS
SELECT 
    BusinessEntityID,
    PersonType,
    dbo.SpookyHash(CONVERT(binary(50), FirstName)) AS FirstNameHash
FROM Person.Person
GO

Note: I was too lazy to write my own CLR function, so this one is from this Q&A.

I can make that an indexed view by clustering it on BusinessEntityID:

CREATE UNIQUE CLUSTERED INDEX CX_BusinessEntityID 
ON dbo.PersonWithHashForSomeReason (BusinessEntityID);
GO

Looking for a specific set of rows in the table produces an index scan (execution plan link). Notice the lack of a Compute Scalar operator, which would normally be used to produce the hash value. Since the hash is persisted to disk in the indexed view, that's not necessary:

SELECT BusinessEntityID, FirstNameHash 
FROM dbo.PersonWithHashForSomeReason WITH (NOEXPAND)
WHERE FirstNameHash = 0x910C426C533F2C0AAF350158331E3B01;

Screenshot of execution plan in Plan Explorer

I had to use a NOEXPAND hint to get it to use the view.

You'll notice the warnings in the plan are due to the fact that the whole table was scanned in order to find those values, since there is no index keyed on FirstNameHash.

Unfortunately, trying to create a nonclustered index on this persisted value fails:

CREATE NONCLUSTERED INDEX IX_FirstNameHash 
ON dbo.PersonWithHashForSomeReason (FirstNameHash);

Msg 1976, Level 16, State 1, Line 48
Cannot create index or statistics 'IX_FirstNameHash' on view 'dbo.PersonWithHashForSomeReason' because cannot verify key column 'FirstNameHash' is precise and deterministic. Consider removing column from index or statistics key, marking column persisted in base table if it is computed, or using non-CLR-derived column in key.

This is because SQL Server doesn't trust us:

Why don’t we trust the user? What can really go wrong here??

Assume if the user incorrectly sets the custom attribute IsDeterminsitic to true for a non-deterministic function and assume he is able to create an index on a computed column that invoked this function without persisting it. [...] This might lead to index corruption as the function might return different values for the same input as the function is non-deterministic.

So for the sake of user’s safety, Sql Server in this release REQUIRES the user to persist the computed columns (unlike the tsql case) to actually index the computed columns.

I imagine the same limitation exists for fulltext indexes, although I haven't tried it.


The takeaway from all this, as you mentioned in your own post, is that you're kind of stuck here. Really your only option is to:

  • accept the table scan (not ideal, and probably not even practical based on your description), or
  • use triggers to "manually" keep this value up to date in a normal column