Sql-server – STATISTICS IO difference for heap and clustered index

sql serversql server 2014sql-server-2008-r2

Assume we have two tables with data:

create table heap (value int);
create table clust (value int primary key);
insert into heap values (1);
insert into clust values (1);

By inspecting their storage statistics

select obj.name, st.alloc_unit_type_desc, st.index_level, st.page_count
from (values ('heap'), ('clust')) obj(name)
    join sys.indexes ix on ix.object_id = object_id(obj.name)
    cross apply sys.dm_db_index_physical_stats(
        db_id(), ix.object_id, ix.index_id, NULL, 'DETAILED') st;

select obj.name, au.total_pages, au.used_pages, au.data_pages
from (values ('heap'), ('clust')) obj(name)
    join sys.indexes ix on ix.object_id = object_id(obj.name)
    join sys.partitions p on p.object_id = ix.object_id and p.index_id = ix.index_id
    join sys.allocation_units au on au.container_id = p.partition_id;

one can see that both occupy the same number of pages:

name  alloc_unit_type_desc index_level page_count
----- -------------------- ----------- ----------
heap  IN_ROW_DATA          0           1
clust IN_ROW_DATA          0           1

name  total_pages  used_pages  data_pages
----- ------------ ----------- -----------
heap  2            2           1
clust 2            2           1

In this case why statistics io

set nocount on;
set statistics io on;
declare @cnt int;
select @cnt = count(1) from heap;
select @cnt = count(1) from clust;
set statistics io off;

show that scanning the clustered index compared to scanning the heap takes one extra logical read?

Table 'heap'. Scan count 1, logical reads 1...
Table 'clust'. Scan count 1, logical reads 2...

What this extra logical read is?

P.S.
My question is not about "clustered index vs heap performance" or query tuning. I'm trying to better understand things involved into STATISTICS IO reported for clustered index scan.


The example is for SqlServer 2014

Microsoft SQL Server 2014 (SP2) (KB3171021) – 12.0.5000.0 (X64)

I also tried it on SqlServer 2008 R2

Microsoft SQL Server 2008 R2 (SP3) – 10.50.6220.0 (X64)

and result is the same (though one can not use sys.dm_db_index_physical_stats on 2008 R2 this way).

Best Answer

Accessing either a Clustered Index or Non-Clustered Index requires traversing that b-tree structure. For some reason, on scan operations there seems to be one extra logical read. In your case you have a single page, which is the 2 logical reads. If you had enough rows to fill up enough data pages that would in turn require another level within the b-tree index structure, then you would see an additional logical read.

Heaps, by definition, have no index (b-tree) structure. Since you have one data page, the operations only need that one logical read. In such a simplistic example it would appear to be less work than the Clustered Index approach, but as soon as you get a few more data pages then you will start to see a difference since the b-tree structure will allow for going directly to appropriate data pages while the Heap still has to check all of the pages.

For example, I have a test table with a structure of:

CREATE TABLE [dbo].[GuidPkAsUI](
    [ID] [uniqueidentifier] NOT NULL CONSTRAINT [PK_GuidPkAsUI] PRIMARY KEY CLUSTERED,
    [InsertTime] [datetime] NOT NULL
                            CONSTRAINT [DF_GuidPkAsUI_InsertTime]  DEFAULT (getdate()),
);

It has 767,968 rows in it via:

INSERT INTO [dbo].[GuidPkAsUI] ([ID])
  SELECT TOP (767968) NEWID()
  FROM   master.sys.all_columns ac1
  CROSS JOIN master.sys.objects so1;

I copied it to a new table that is the same structure and data, but missing the two constraints (i.e. no Clustered Index) using the following:

SELECT *
INTO dbo.GuidPkAsUIheap
FROM dbo.GuidPkAsUI;

The following queries:

SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.GuidPkAsUIheap'),
                                             0, NULL, 'DETAILED');

SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.GuidPkAsUI'),
                                             1, NULL, 'DETAILED');

show that GuidPkAsUIheap has 1 level (with 3135 data pages) and GuidPkAsUI has 3 levels (with 3135 data pages, 20 index pages on one level -- intermediate, and 1 index page on another level -- root, totaling 3156 pages).

The following queries:

SET STATISTICS IO ON;
SELECT COUNT(1) FROM dbo.GuidPkAsUIheap;
SET STATISTICS IO OFF;
-- 3135

SET STATISTICS IO ON;
SELECT COUNT(1) FROM dbo.GuidPkAsUI;
SET STATISTICS IO OFF;
-- 3157

shows that GuidPkAsUIheap requires 3135 logical reads (the number of data pages) while GuidPkAsUI requires 3157 logical reads (the number of data and index pages plus one). So here the logical reads for the Clustered Index are still higher than for the Heap.

I then rebuilt the tables via:

ALTER TABLE [dbo].[GuidPkAsUIheap] REBUILD;
ALTER TABLE [dbo].[GuidPkAsUI] REBUILD WITH (FILLFACTOR = 100);

Running the SELECT * FROM sys.dm_db_index_physical_stats queries above again shows the Heap to be the same but the Clustered Index now has only 10 intermediate index pages instead of 20.

Running the SELECT COUNT(1) queries above again shows the same 3135 logical reads for the Heap and 3147 logical reads for the Clustered Index (all data and index pages plus one).

Now, let's find one specific row:

SET STATISTICS IO ON;
SELECT * FROM dbo.GuidPkAsUIheap WHERE [ID] = '93359759-193F-4CBF-B9F6-738475F8488E';
SET STATISTICS IO OFF;
-- 3135


SET STATISTICS IO ON;
SELECT * FROM dbo.GuidPkAsUI WHERE [ID] = '93359759-193F-4CBF-B9F6-738475F8488E';
SET STATISTICS IO OFF;
-- 3

The Heap still takes 3135 logical reads. But the Clustered Index takes a mere 3 logical reads: 1 for the root index page, 1 for the next level index page, and 1 for the leaf level / data page.

Now let's force a scan as we look for a single row:

SET STATISTICS IO ON;
SELECT * FROM dbo.GuidPkAsUIheap WHERE CONVERT(CHAR(36), [ID]) = '98331062-8BF3-4FAE-98B4-204D0DE06FE1';
SET STATISTICS IO OFF;
-- 3135


SET STATISTICS IO ON;
SELECT * FROM dbo.GuidPkAsUI WHERE CONVERT(CHAR(36), [ID]) = '98331062-8BF3-4FAE-98B4-204D0DE06FE1';
SET STATISTICS IO OFF;
-- 3147

Here we get the same logical reads that the COUNT(1) queries get.