SQL Server Indexing – Reasons to Have Both Index with Included Columns and One Without

indexnonclustered-indexsql server

I am looking through a few slow running queries on a database that has under gone significant development (under a number of different developers) over the last couple years.
One of the most commonly used tables has a couple indexes with the same indexed columns, but one will have included columns and the other will not.

For Example:
One Index:

 CREATE NONCLUSTERED INDEX [IDX_tblTable_IndexedColumnId_Inc] ON [dbo].[tblTable]
 (
[IndexedColumnId] ASC
 )

And then the other:

CREATE NONCLUSTERED INDEX [IDX_tblTable_IndexedColumnId_Inc] ON [dbo].[tblTable]
(
[IndexedColumnId] ASC
)
INCLUDE (   [Field1Id],
[Field2],
[Field3],
[Field4],
[Field5],
[Field6])

I would assume this is just an oversight and is just using up disk space and additional overhead to have the index without the included columns (the included columns are known to be needed).

Is there any benefit of having two indexes on the same column, one with included columns and one without?

Best Answer

To expand a little on what everyone's favorite n-dimensional analogue commented, there may be some benefit to having both.

Here's a small (but not inadequate) demo.

The necessary:

USE StackOverflow;
SET NOCOUNT ON; 

CREATE TABLE dbo.IndexCrap
(
    Id INT IDENTITY PRIMARY KEY CLUSTERED,
    Whatever DATETIME,
    Nonsense VARCHAR(50),
    Etc UNIQUEIDENTIFIER,
    SoForth BIT
);

INSERT dbo.IndexCrap WITH (TABLOCK)
        (Whatever, Nonsense, Etc, SoForth )
SELECT TOP 1000 DATEADD(DAY, x.n, GETDATE()),
                REPLICATE('A', x.n % 50),
                NEWID(),
                CASE WHEN x.n % 15 = 0 THEN 1 ELSE 0 END
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
        FROM sys.messages AS m ) AS x;

The indexes:

CREATE INDEX ix_tinydancer ON dbo.IndexCrap (Whatever);

CREATE INDEX ix_largemarge ON dbo.IndexCrap (Whatever) INCLUDE (Nonsense, Etc, SoForth);

So like your question, on index on one column, one index on the same column with some includes.

The queries:

SELECT COUNT(*)
FROM dbo.IndexCrap AS ic;

SELECT ic.Nonsense, ic.Etc, ic.SoForth
FROM dbo.IndexCrap AS ic
WHERE ic.Whatever >= DATEADD(DAY, 500, GETDATE());

The execution plans:

For the COUNT(*) query, the optimizer chooses our smallest single column index over the wider nonclustered index and the slightly wider clustered index, since it's not filtered, and all we need is a row count.

For the wider query, the optimzer chooses the covering index so it doesn't have to a) scan the clustered index or b) access the narrower index and do Key Lookups back to the clustered index.

Of course, you may not know every single query that runs on your system. That's why the company I work for writes sp_BlitzIndex, a free tool to analyze your indexes.

If we run EXEC master.dbo.sp_BlitzIndex @DatabaseName = 'StackOverflow', @Mode = 4;, here are some results we're interested in:

The two indexes I added to IndexCrap (not counting the PK/CX) get flagged as having duplicate key columns. It also gives their definitions, usage, size, and a whole lot more not shown in the screencap.

The usage counters aren't perfect. They'll reset if you add or drop indexes, in some versions of SQL Server they'll reset if you rebuild indexes, and of course they reset if you restart SQL Server.

Hope this helps!

Example

CREATE TABLE dbo.Test
(
    pk integer PRIMARY KEY NONCLUSTERED,
    c1 integer NOT NULL,
    c2 integer NOT NULL
);

-- Not unique on pk, c1
CREATE NONCLUSTERED INDEX ic1 ON dbo.Test (pk, c1);

-- Unique on pk, c2
CREATE UNIQUE NONCLUSTERED INDEX ic2 ON dbo.Test (pk, c2);

Uniqueness

-- Neither plan has an aggregate
SELECT DISTINCT T.c1 FROM dbo.Test AS T WHERE T.pk = 1;
SELECT DISTINCT T.c2 FROM dbo.Test AS T WHERE T.pk = 1;

Split, Sort, Collapse

-- No split, sort, collapse
UPDATE dbo.Test SET c1 = CHECKSUM(NEWID());

-- Split, sort, collapse updating unique key
UPDATE dbo.Test SET c2 = CHECKSUM(NEWID());

Note the split-sort-collapse plan is also a wide (per-index) update.

Uniqueness is a huge topic though. I would normally mark something that is unique as unique, unless there is a good reason not to. Some further reading from my blog:

To anticipate comments about heap tables: Most tables benefit from being clustered. You need good reasons to choose a heap structure, especially from a space management point of view, if the table ever experiences deletes. Updates can also introduce performance impacts if columns expand beyond the space available on the original page (forwarded records).

Best Answer

Related Solutions

Sql-server – SQL Server : Included columns in an Index: is there something like “too many”

Sql-server – Will a nonclustered index with a unique column always address all queries filtering that column first

Example

Uniqueness

Split, Sort, Collapse

Related Question