Sql-server – SQL Server: Covering indexes including all columns

azure-sql-databasesql serversql server 2014sql-server-2012sql-server-2016

Our team has inherited an application and associated database. The previous developers appear to have enforced a rule where every index, on every table, has an INCLUDE clause to always add every column that isn't otherwise part of the key. These tables have on average anywhere from two to five indexes or unique constraints as well as foreign keys.

The intent looks to be to improve SELECT performance regardless of what query is thrown at the database, as access is via an ORM that by default (but not always) retrieves all columns. We expect that the side effects of this are increased storage requirements (possibly significantly so) and additional overhead time for INSERT/UPDATE/DELETE.

The question is, is this a sensible strategy? Our team has history with SQL Server but no members who would consider themselves experts on its internal behaviour (though the question has been raised that if this strategy was optimal, wouldn't it be the default by now?). What other side effects (database server CPU/memory/TempDB usage, etc) should we be expecting, or are some of our assumptions above incorrect?

Additionally, the application can be installed into both SQL Server on-premise (versions since 2012), as well as Azure SQL — should we be prepared for any differences between the two, or additional side effects on Azure, as a result of this approach?

Best Answer

I've done this on specific indexes before now, to aid oft-run heavy queries. Effectively what they have done is create multiple clustered indexes: when any of those indexes is used to find rows no extra work is needed looking up the rest of the data in the real clustered index (or the heap if there is no real clustered index).

is this a sensible strategy?

For some indexes where needed to support certain query patterns, certainly yes.

But to do this with all indexes, I would just as certainly say no.

It is going to be wasteful of space to do where not actually needed, and will slow inserts/updates significantly. It may slow down as many read queries as it helps too, because each index page holds less records so any query needing to reference a chunk of the index for filtering but not using all other columns will have to access more pages. This will make your database more memory-hungry: those pages will need to be loaded into the buffer pool, potentially ejecting other useful pages if memory is low. If compression is used on those indexes to try mitigate the effect on storage and memory requirements, then it will be pushing extra load to the CPUs instead.

as access is via an ORM that by default (but not always) retrieves all columns

This is a common pattern with poorly optimised use of an ORM (or just naive ORMs) and in these cases I've seen SQL Server's index adviser (and similar 3rd party tools) suggest indexes with many INCLUDEd columns, so I would agree with your suggestion that this is why the indexes have been created this way.

But while it may make all such queries slightly faster and some of them significantly faster, I suspect that in many cases any benefit is so small as to not be worth the extra memory footprint required by your common working set, the space on disk, and the IO between disk and memory.

Also remember that the ORM might not be selecting out all columns of all tables tha a query touches, so that benefit might only hold for the main target of the current request, and the larger indexes may penalise the query when other objects are used for filtering but not returning data (SELECT * FROM table1 WHERE id IN (SELECT someID FROM table2 WHERE someColumn='DesiredValue') perhaps).

Another consideration for the excess space used, particularly if the data is large, is that it will have an impact on your backup strategy: storage and transfer costs for those backups, potential restore times, and so forth.

should we be prepared for any differences between the two [on-prem & AzureSQL]

Generally I think the considerations here are going to be the same in each case, though any excess memory/IO cost imposed by the large indexes may be more directly visible in Azure where you can tweak the service tier and therefore infrastructure cost more easily rather than having a relatively fixed set of hardware resource. If using standard/premium tiers instead of vcore based pricing then you'll be affected more by the IO cost in standard as premium includes significantly more IO per DTU. If you are using multi-region backups or redundancy or other non-local features in Azure than there might be a bandwidth cost associated with the extra space taken by uneccesarily wide indexes.