Sql-server – Why would an index covering only half of primary key be useful

database-tuning-advisorsql server

We have a primitive junction table to store a many-to-many relation between two other tables, their primary keys being bigint:

CREATE TABLE [dbo].[JunctionTable] (
    [ID-1] [bigint] NOT NULL,
    [ID-2] [bigint] NOT NULL,
    CONSTRAINT [PK_JunctionTable] PRIMARY KEY CLUSTERED ([ID-1] ASC, [ID-2] ASC)
)

When the Tuning Advisor was run on the database, it suggested (among many others, obviously) the following index to be added:

CREATE NONCLUSTERED INDEX [_dta_index_JunctionTable_8_296569782__K1] ON [dbo].[JunctionTable] 
(
    [ID-1] ASC
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

And I wonder: is this index good for anything? AFAICT it is the left half of a clustered primary key and the space reduction coming from omitting a single bigint cannot be that important, I guess. Or, is there any scenario when this index would help? And if not, why would the DTA suggest it? (The Advisor suggested many new indexes which I consider silly, but this one seemed the silliest, so I thought I ask…)

Best Answer

The Database Tuning Advisor does not always make sensible suggestions. It is always a good idea to review any suggestions made (by any tool more that matter).

In general, the cost of an index seek is directly related to the depth of the B+Tree that makes up that index. Seeking in an index that is 8 levels deep is 4 times as expensive as seeking in an index that is only 2 levels deep. The depth of the index is strongly correlated with the length of the index key(s). Other important factors are uniqueness of the index, the length of the clustered index key and wether there are any included columns.

If you create an index on one column you generally get a index less deep than if you create an index on that same column and an additional column. However, in your case, as the index on ID-1 is not declared unique (it can't be), the clustered index key automatically becomes part of the key. Assuming there are no other columns, that additional index will be an exact copy of the clustered index. If you however have additional columns in the table, you might end up with a small difference in index depth. But even then it is usually not worth creating the index.

The difference becomes bigger if you have a lot of additional columns and you are looking at queries that would scan the new index like this one:

SELECT [ID-1],COUNT(1) AS Cnt
  FROM [dbo].[JunctionTable] 
 GROUP BY [ID-1];

As all those additional column are not part of the index the scan has to go through a lot less pages. If you for example had two additional DATETIME columns in that table, the index scan speed would probably double with that new index. But you would have to run this query very often for the index to be worth the additional maintenance overhead.

Related Solutions

Sql-server – Best solution to fixing database design with GUID as primary key

You certainly need NOT to cluster on the GUID. If you have something which would allow you to uniquely identify records other than that GUID, I'd suggest you look at building an unique index on that other field & making that index clustered. If not, you are free to cluster on other fields, even using nonunique indexes. The approach there would be to cluster however best facilitates splitting your data up and querying - so, if you've got a "region" field, or something, that might be a candidate for your clustering scheme.

The problem with changing to a BIGINT would be additions to the data from other databases & integrating their database into the central store. If this is not a consideration - and will never be a consideration - then, yes, the BIGINT would solve the index rebalancing issue nicely.

Behind the scenes, if you don't specify a clustered index, SQL Server does much the same thing: it creates a row ID field & maps all other indexes into that. So, by doing it yourself, you're solving it just as SQL would solve it.

SQL Server Hierarchical Order – Parent-Child Tree Hierarchical ORDER BY

OK, enough brain cells are dead.

SQL Fiddle

WITH cte AS
(
  SELECT 
    [ICFilterID], 
    [ParentID],
    [FilterDesc],
    [Active],
    CAST(0 AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters]
  WHERE [ParentID] = 0
  UNION ALL
  SELECT 
    i.[ICFilterID], 
    i.[ParentID],
    i.[FilterDesc],
    i.[Active],  
    Level + CAST(i.[ICFilterID] AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters] i
  INNER JOIN cte c
    ON c.[ICFilterID] = i.[ParentID]
)

SELECT 
  [ICFilterID], 
  [ParentID],
  [FilterDesc],
  [Active]
FROM cte
ORDER BY [Level];

Best Answer

Related Solutions

Sql-server – Best solution to fixing database design with GUID as primary key

SQL Server Hierarchical Order – Parent-Child Tree Hierarchical ORDER BY

Related Question