Sql-server – Skipping some data from Indexing in SQL Server

nonclustered-indexsql server

I have a situation where SQL Server is giving error that index size exceeding 900 bytes. Its happening because one of the column in index definition is varchar(2000), and whenever its length goes over 800 it gives the error.

So, my question is, is there any way to say in index definition to include only first 800 characters from the column?

Some people on blogs have suggested to add one additional column to the table which will have first 800 characters of other column, and add indexing on this new additional column rather than original column. But I think that will impact the overall performance because read queries are still querying on original column.

Please suggest if there is some other way to tackle this problem.

Best Answer

I wrote a post about this kind of situation here, which goes into more depth than this answer.

is there any way to say in index definition to include only first 800 characters from the column?

This can be done by creating a computed column as LEFT(MyColumn, 800), and then indexing that column. Note that the column doesn't have to be materialized in the table using the PERSISTED clause -- creating an index on the column will materialize the values into only the index. Use caution with the computed column approach, though, because it may not behave as you might expect in some cases (issues with computed columns in general, not specific to this situation).

This approach allows you to do both exact matching and LIKE 'xyz%' directly as long as the search term is shorter than the indexed length.

You can use CHECKSUM to generate a small hash

The result of the CHECKSUM function may be small, but it's not a very good hash function. Use HASHBYTES or a CLR function instead.

Same thing applies as above regarding computed columns. Or, this value is small enough (16 bytes) that it can be materialized in the table using a trigger or application code and indexed from there.

The downsides are that this method only allows for exact matching, because there's no way to match on a partial input, and of course the change management required to add a new non-NULL column to an existing table.

read queries are still querying on original column.

You'll need to change how queries access the table to make one of these solutions work, so if that's not possible, you're out of luck.

Because we're talking about lossy compression (hashing or truncation), queries always have to also compare the original value against the search term, unless the index is made unique, which may open you up to users reporting weird errors (first 800 characters are identical, or an unlikely hash collision). If the index isn't unique and the original values aren't compared, incorrect results could be returned (note: this could be a security hole).

The additional comparison will likely effect the query plans, and I'm not getting into that without a sample query, but you can find some small examples in the blog post I linked to above. Using your existing table schema and queries, you should be able to make an educated guess about what's going to happen to the plans.