Columnstore Structure
Columnstore data is physically stored in one or more segments (regular LOB allocation units) per column, and may also be partitioned in the usual way. Each segment contains roughly one million rows of highly-compressed values or value references (several compression techniques are available). A value reference links to an entry in one of up to two hash dictionaries.
Dictionaries are pinned in memory during query execution, with data value IDs from the segment being looked in the dictionary whenever execution requires the actual data value (this lookup is deferred for as long as possible for performance reasons).
Segments also have a header record containing metadata such as the minimum and maximum values stored in the segment. Information from the header can often be used to eliminate complete partitions from processing at execution time. Header record information is stored in the usual LOB data root structure, so eliminating a segment means the Storage Engine can skip reading the LOB data pages from physical storage entirely. Maximizing the potential for elimination can require careful design, including a dependency on the clustered index order at the time the Columnstore index is built.
Specific Plan Operators
SQL Server 2012 introduces a new execution mode called Batch Mode. In this mode, packets of roughly 1000 rows are passed between operators, significantly improving processor utilization efficiency. Within each packet, columnar data is represented as a vector. Not all plan operators support batch mode operation, but examples of those that do include Columnstore Index Scan, Hash Inner Join, Batch Hash Table Build, Bitmap Filter, Hash Aggregate (not scalar aggregates), Filter, and Compute Scalar (for projection and expression evaluation). Query execution plans have been enhanced to show estimated and actual execution mode.
Anti-Patterns
There are a large number of restrictions in the first release, including constraints on allowable data types. Most common types are supported; unsupported data types include DECIMAL
with a precision greater than 18 digits, (N)VARCHAR(MAX)
, UNIQUEIDENTIFIER
, CLR types, and (VAR)BINARY
.
Use of string types, OUTER JOIN
, IN
, EXISTS
, NOT IN
, OR
, UNION ALL
may result in significantly reduced performance (Row Mode execution), unless workarounds are employed that typically involve unusual syntax rewrites as shown in the linked articles in this section.
More Information
Remus Rusanu has blogged a great overview here.
It all depends on the definitions and the key (and non-key) columns defined in the nonclustered index. The clustered index is the actual table data. Therefore it contains all of the data in the data pages, whereas the nonclustered index is only containing columns' data as defined in the index creation DDL.
Let's set up a test scenario:
use testdb;
go
if exists (select 1 from sys.tables where name = 'TestTable')
begin
drop table TestTable;
end
create table dbo.TestTable
(
id int identity(1, 1) not null
constraint PK_TestTable_Id primary key clustered,
some_int int not null,
some_string char(128) not null,
some_bigint bigint not null
);
go
create unique index IX_TestTable_SomeInt
on dbo.TestTable(some_int);
go
declare @i int;
set @i = 0;
while @i < 1000
begin
insert into dbo.TestTable(some_int, some_string, some_bigint)
values(@i, 'hello', @i * 1000);
set @i = @i + 1;
end
So we've got a table loaded with 1000 rows, and a clustered index (PK_TestTable_Id
) and a nonclustered index (IX_TestTable_SomeInt
). As you've seen in your testing, but just for thoroughness:
set statistics io on;
set statistics time on;
select some_int
from dbo.TestTable -- with(index(PK_TestTable_Id));
set statistics io off;
set statistics time off;
-- nonclustered index scan (IX_TestTable_SomeInt)
-- logical reads: 4
Here we have a nonclustered index scan on the IX_TestTable_SomeInt
index. We have 4 logical reads for this operation. Now let's force the clustered index to be used.
set statistics io on;
set statistics time on;
select some_int
from dbo.TestTable with(index(PK_TestTable_Id));
set statistics io off;
set statistics time off;
-- clustered index scan (PK_TestTable_Id)
-- logical reads: 22
Here with the clustered index scan we have 22 logical reads. Why? Here's why. It all matters on how many pages that SQL Server has to read in order to grab the entire result set. Get the average row count per page:
select
object_name(i.object_id) as object_name,
i.name as index_name,
i.type_desc,
ips.page_count,
ips.record_count,
ips.record_count / ips.page_count as avg_rows_per_page
from sys.dm_db_index_physical_stats(db_id(), object_id('dbo.TestTable'), null, null, 'detailed') ips
inner join sys.indexes i
on ips.object_id = i.object_id
and ips.index_id = i.index_id
where ips.index_level = 0;
Take a look at my result set of the above query:
As we can see here, there are an average of 50 rows per page on the leaf pages for the clustered index, and an average of 500 rows per page on the leaf pages for the nonclustered index. Therefore, in order to satisfy the query more pages need to be read from the clustered index.
Best Answer
In general you should be able to remove Index2 without any major issues. Index1 can cover any queries that Index2 would currently be handling (assuming there aren't any included column differences between the two).
If column C in Index1 is large, you might end up reading slightly more data into memory with Index1 for queries that would have used Index2 since that third column is taking up more space on each page, but in most scenarios I don't think you would notice any perceptible differences in performance.