SQL Server – Index vs Statistic

indexsql serverstatistics

What are the differences between CREATE INDEX and CREATE STATISTICS and when should I use each?

Best Answer

Indexes store actual data (data pages or index pages depending on the type of index we are talking about), and Statistics store data distribution. Therefore, CREATE INDEX will be the DDL to create an index (clustered, nonclustered, etc.) and CREATE STATISTICS is the DDL to create the statistics on columns within the table.

I recommend you read about these aspects of relational data. Below are a couple of beginner, introductory articles. These are very broad topics, and therefore the information on them can go very wide and very deep. Read up on the general idea of them below, and ask more specific questions when they arise.

BOL reference on Table and Index Organization
BOL reference on Clustered Index Structure
BOL reference on Nonclustered Index Structures
SQL Server Central on the Introduction to Indexes
BOL reference on Statistics

Here is a working example to see these two parts in action (commented to explain):

use testdb;
go

create table MyTable1
(
    id int identity(1, 1) not null,
    my_int_col int not null
);
go

insert into MyTable1(my_int_col)
values(1);
go 100

-- this statement will create a clustered index
-- on MyTable1.  The index key is the id field
-- but due to the nature of a clustered index
-- it will contain all of the table data
create clustered index MyTable1_CI
on MyTable1(id);
go


-- by default, SQL Server will create a statistics
-- on this index.  Here is proof.  We see a stat created
-- with the name of the index, and the consisting stat 
-- column of the index key column
select
    s.name as stats_name,
    c.name as column_name
from sys.stats s
inner join sys.stats_columns sc
on s.object_id = sc.object_id
and s.stats_id = sc.stats_id
inner join sys.columns c
on sc.object_id = c.object_id
and sc.column_id = c.column_id
where s.object_id = object_id('MyTable1');


-- here is a standalone statistics on a single column
create statistics MyTable1_MyIntCol
on MyTable1(my_int_col);
go

-- now look at the statistics that exist on the table.
-- we have the additional statistics that's not necessarily
-- corresponding to an index
select
    s.name as stats_name,
    c.name as column_name
from sys.stats s
inner join sys.stats_columns sc
on s.object_id = sc.object_id
and s.stats_id = sc.stats_id
inner join sys.columns c
on sc.object_id = c.object_id
and sc.column_id = c.column_id
where s.object_id = object_id('MyTable1');


-- what is a stat look like?  run DBCC SHOW_STATISTICS
-- to get a better idea of what is stored
dbcc show_statistics('MyTable1', 'MyTable1_CI');
go

Here is what a test sample of statistics can look like:

enter image description here

Notice that Statistics are the containment of the data distribution. They help SQL Server determine an optimal plan. A good example of this is, imagine you are going to life a heavy object. If you knew how much that weight because there was a weight marking on it, you'd determine the best way to lift and with what muscles. That's sort of what SQL Server does with statistics.

-- create a nonclustered index
-- with the key column as my_int_col
create index IX_MyTable1_MyIntCol
on MyTable1(my_int_col);
go

-- let's look at this index
select
    object_name(object_id) as object_name,
    name as index_name,
    index_id,
    type_desc,
    is_unique,
    fill_factor
from sys.indexes
where name = 'IX_MyTable1_MyIntCol';

-- now let's see some physical aspects
-- of this particular index
-- (I retrieved index_id from the above query)
select *
from sys.dm_db_index_physical_stats
(
    db_id('TestDB'),
    object_id('MyTable1'),
    4,
    null,
    'detailed'
);

We can see from the example above that the index actually contains data (depending on the type of index, leaf pages will be different).

This post has only shown a very very very brief overview of these two large aspects SQL Server. Both of these could take up chapters, and books. Read some of the references, and then you will have a better grasp.

Best Answer

Related Solutions

Sql-server – Advisability of using STATISTICS_NORECOMPUTE

Sql-server – When is it better to create STATISTICS instead of creating an Index

Related Question