Sql-server – Redundant indexes in SQL Server

indexindex-tuningsql servert-sql

I am currently going through all indexes in our data warehouse as a performance tuning exercise. Several of these indexes are not used in the system stats, and I am considering removing them to reduce overhead during batch.

Some of these does not make sense, due to PK columns being used in the middle of other indexes, etc. However, I was wondering how SQL Server handles multi column indexes (with overlapping column).

Examples:

NONCLUSTERED INDEX Index1 (A, B, C)

NONCLUSTERED INDEX Index2 (A, B)

Would SQL Server be able to use Index1 for the same rows as Index2 and Index2 is not needed? I know the rowID are contained in the leaf level node of the B+tree, however all records for C would be valid in Index1.

Can Index2 be removed without impacting performance?
As far as I have understood, indexes can also help with sorts in execution plan. Can Index1 be used for these as well?

Best Answer

In general you should be able to remove Index2 without any major issues. Index1 can cover any queries that Index2 would currently be handling (assuming there aren't any included column differences between the two).

If column C in Index1 is large, you might end up reading slightly more data into memory with Index1 for queries that would have used Index2 since that third column is taking up more space on each page, but in most scenarios I don't think you would notice any perceptible differences in performance.

Related Solutions

Sql-server – What’s the anatothe of a columnstore index

Columnstore Structure

Columnstore data is physically stored in one or more segments (regular LOB allocation units) per column, and may also be partitioned in the usual way. Each segment contains roughly one million rows of highly-compressed values or value references (several compression techniques are available). A value reference links to an entry in one of up to two hash dictionaries.

Dictionaries are pinned in memory during query execution, with data value IDs from the segment being looked in the dictionary whenever execution requires the actual data value (this lookup is deferred for as long as possible for performance reasons).

Segments also have a header record containing metadata such as the minimum and maximum values stored in the segment. Information from the header can often be used to eliminate complete partitions from processing at execution time. Header record information is stored in the usual LOB data root structure, so eliminating a segment means the Storage Engine can skip reading the LOB data pages from physical storage entirely. Maximizing the potential for elimination can require careful design, including a dependency on the clustered index order at the time the Columnstore index is built.

Specific Plan Operators

SQL Server 2012 introduces a new execution mode called Batch Mode. In this mode, packets of roughly 1000 rows are passed between operators, significantly improving processor utilization efficiency. Within each packet, columnar data is represented as a vector. Not all plan operators support batch mode operation, but examples of those that do include Columnstore Index Scan, Hash Inner Join, Batch Hash Table Build, Bitmap Filter, Hash Aggregate (not scalar aggregates), Filter, and Compute Scalar (for projection and expression evaluation). Query execution plans have been enhanced to show estimated and actual execution mode.

Anti-Patterns

There are a large number of restrictions in the first release, including constraints on allowable data types. Most common types are supported; unsupported data types include DECIMAL with a precision greater than 18 digits, (N)VARCHAR(MAX), UNIQUEIDENTIFIER, CLR types, and (VAR)BINARY.

Use of string types, OUTER JOIN, IN, EXISTS, NOT IN, OR, UNION ALL may result in significantly reduced performance (Row Mode execution), unless workarounds are employed that typically involve unusual syntax rewrites as shown in the linked articles in this section.

More Information

Remus Rusanu has blogged a great overview here.

Sql-server – Logical reads amount difference with almost identical indexes

It all depends on the definitions and the key (and non-key) columns defined in the nonclustered index. The clustered index is the actual table data. Therefore it contains all of the data in the data pages, whereas the nonclustered index is only containing columns' data as defined in the index creation DDL.

Let's set up a test scenario:

use testdb;
go

if exists (select 1 from sys.tables where name = 'TestTable')
begin
    drop table TestTable;
end
create table dbo.TestTable
(
    id int identity(1, 1) not null
        constraint PK_TestTable_Id primary key clustered,
    some_int int not null,
    some_string char(128) not null,
    some_bigint bigint not null
);
go

create unique index IX_TestTable_SomeInt
on dbo.TestTable(some_int);
go

declare @i int;
set @i = 0;

while @i < 1000
begin
    insert into dbo.TestTable(some_int, some_string, some_bigint)
    values(@i, 'hello', @i * 1000);

    set @i = @i + 1;
end

So we've got a table loaded with 1000 rows, and a clustered index (PK_TestTable_Id) and a nonclustered index (IX_TestTable_SomeInt). As you've seen in your testing, but just for thoroughness:

set statistics io on;
set statistics time on;

select some_int
from dbo.TestTable -- with(index(PK_TestTable_Id));

set statistics io off;
set statistics time off;
-- nonclustered index scan (IX_TestTable_SomeInt)
-- logical reads: 4

Here we have a nonclustered index scan on the IX_TestTable_SomeInt index. We have 4 logical reads for this operation. Now let's force the clustered index to be used.

set statistics io on;
set statistics time on;

select some_int
from dbo.TestTable with(index(PK_TestTable_Id));

set statistics io off;
set statistics time off;
-- clustered index scan (PK_TestTable_Id)
-- logical reads: 22

Here with the clustered index scan we have 22 logical reads. Why? Here's why. It all matters on how many pages that SQL Server has to read in order to grab the entire result set. Get the average row count per page:

select
    object_name(i.object_id) as object_name,
    i.name as index_name,
    i.type_desc,
    ips.page_count,
    ips.record_count,
    ips.record_count / ips.page_count as avg_rows_per_page
from sys.dm_db_index_physical_stats(db_id(), object_id('dbo.TestTable'), null, null, 'detailed') ips
inner join sys.indexes i
on ips.object_id = i.object_id
and ips.index_id = i.index_id
where ips.index_level = 0;

Take a look at my result set of the above query:

enter image description here

As we can see here, there are an average of 50 rows per page on the leaf pages for the clustered index, and an average of 500 rows per page on the leaf pages for the nonclustered index. Therefore, in order to satisfy the query more pages need to be read from the clustered index.

Best Answer

Related Solutions

Sql-server – What’s the anatothe of a columnstore index

Sql-server – Logical reads amount difference with almost identical indexes

Related Question