Sql-server – MSSQL – Primary Key Clustered specifying multiple columns

sql servert-sql

I've inherited a database (MSSQL 2008R2) with a lot of tables that either have NO primary key or primary key's that look like this:

 ALTER TABLE [dbo].[Distribution_Batch] ADD  CONSTRAINT 
 [PK_Distribution_Batch_1__23] PRIMARY KEY CLUSTERED 
 (
    [BATCH_ID] ASC,
    [DATE_CREATED] ASC,
    [CONTACT_ID] ASC,
    [LAB_CODE] ASC,
    [DISTRIBUTION_TYPE] ASC,
    [CREATOR] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Here is the table itself, these are the columns:

 SELECT TOP (1000) [BATCH_ID]  --- this is a varchar(100) field
       ,[LAB_CODE]
       ,[DISTRIBUTION_TYPE]
       ,[CONTACT_ID]
       ,[CREATOR]
       ,[DESCRIPTION]
       ,[DATE_CREATED]
       ,[DATE_COMPLETED]
       ,[DATE_DEADLINE]
       ,[DATE_CONFIRMED]
       ,[ERROR_CODE]
       ,[CONTENT]
       ,[ADDED_BY]
       ,[SOURCE]
       ,[STATUS]
       ,[DELIVERY_METHOD]
       ,[FILE_SIZE]
       ,[DISTRIBUTION_FORMAT]
       ,[CONTACT_ID_ORIGINAL]
   FROM [dbo].[Distribution_Batch]

All this table does is queue up distribution jobs to be run by our application. Nothing fancy.

They have indexes on numerous columns:

 [DistributionBatch-ContactId-DistributionType-20161108-121538]
 [idx_date_completed]
 [idx_date_created]
 [idx_DistribBatch_BatchId]
 [idx_Distribution_Contacts]
 [IX_Distribution_Batch]
 [IX_Distribution_Batch_LAB_CODE_DISTRIBUTION_TYPE_STATUS_DATE_COMPLETED]
 [IX_Distribution_Batch_LAB_CODE_STATUS]
 [IX_Distribution_Batch_STATUS]
 [IX_Distribution_Batch_STATUS_DELIVERY_METHOD]
 [PK_Distribution_Batch_1__23]

The system overall is slow and the admins response has always been to add more indexes to each table. This table spits out data sequentially based on whether it is completed or not. I can't understand why they would have so many indexes on columns that are never sorted on. Am I missing something?

My question is 2 parts:

Part 1) Does that primary key make any sense? Shouldn't the primary key just be an ID (int), starting at 1 with identity spec and auto incrementing?

Part 2) I need confirmation that none of those indexes make sense and are not necessary. There are a lot more tables that have this same issue.

This database has 190 GB of data and 101 GB of indexes.
All comments and opinions greatly appreciated.

Best Answer

Primay key doesn't take any extra storage, its just the way data is stored in that sort order. Whether Primary key(clustered index) should be id(identity column) or something else depends on the database designer and the requirement of application. I would suggest you to start with non-clustered index and check whether they are utilized or not by your select queries. Indexes are good only for select however bad for all DML operations(mostly) i.e. Insert/Update/Delete.

Non-clustered indexes are the ones which takes separate storage, since more than 50% of your database is stored as index, you may need to check their usage(reads). It is possible that they are simply creating overhead in writing and not very useful for scan/seek operation. Just to add, having 50% or more storage for index is not always bad, it just depends whether you have more read/write operation - Dataware house/reporting/OLTP.

You may download first responder kit provided by Mr. Brent Ozar from here and can run sp_blitzindex for your database and it would check everything regarding your index - whether they are too much, too less or some other issue.

In a normal scenario, there is rule of 5 non-clustered index per table however again depends on so many other factors. Kindly provide more details on the issue you are facing due to these indexes. for your two questions - I have tried to answer them.

Kindly let us know if that helped.

Related Solutions

Sql-server – Clustered Index vs Primary Key Performance

A clustered index will dictate the order of the data in the pages of your table. When your temp table is initially populated, the inserted data will be sorted or, if you're doing multiple inserts to populate, it's possible you will end up with some page fragmentation as the pages must split to accommodate additional data being written.

Assuming your temp table is populated in a single insert, and you have no page fragmentation, the next issue that may or may not affect your performance is the presence of a 4 byte "uniqueifier." Because the clustered key is used as a pointer to each row in the table, it must be unique for that purpose. So if you're using a non-unique clustered index, every nonclustered index (and your clustered index) is going to be 4 bytes wider than you've defined which will affect page density and IO required to perform the query. This will likely only be of very marginal impact as the uniquifier will only be added to duplicates.

The other factor that you need to consider is, how are you joining to this temp table. SQL Server supports 3 physical join operators; Merge, Loop, and Hash. Merge can yield the best performance in many situations.

Unlike the nested loops join where the total cost may be proportional to the product of the number of rows in the input tables, with a merge join each table is read at most once and the total cost is proportional to the sum of the number of rows in the inputs. Thus, merge join is often a better choice for larger inputs.

It is unlikely SQL will chose a merge operator unless both datasets being joined are sorted by the join criteria. There are other factors as well, of course, but if you can optimize your table for specific, common usage patters, all the better.

The second thing to consider is any other indexes you are going to place on that table since each nonclustered index will contain the keys specified (in the order specified) and the pointer to the clustered index to facilitate the lookup and retrieval of any fields not covered by the index. Again, a narrow clustered key will yield better density in your non-clustered indexes and thus better i/o performance for index scans and seeks. If your nonclustered indexes require many lookups to the clustered index either consider including those columns in your index (sacrificing page density for reduced lookups) or consider clustering on that criteria (if this is your primary usage pattern)

The biggest factor that would steer me towards using a primary key over a unique (or non-unique clustered index) would be the ability to create the index inline in the create table statement. Historically, I have found that the optimizer is less apt to use indexes that I created inside my stored procedure after the create table statement. It's easier to create a primary key inline with the create table statement. I'm not sure if it's possible to do this with a non-constraining index but you can create a unique clustered index inline.

Sql-server – Parent-Child Tree Hierarchical ORDER

OK, enough brain cells are dead.

SQL Fiddle

WITH cte AS
(
  SELECT 
    [ICFilterID], 
    [ParentID],
    [FilterDesc],
    [Active],
    CAST(0 AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters]
  WHERE [ParentID] = 0
  UNION ALL
  SELECT 
    i.[ICFilterID], 
    i.[ParentID],
    i.[FilterDesc],
    i.[Active],  
    Level + CAST(i.[ICFilterID] AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters] i
  INNER JOIN cte c
    ON c.[ICFilterID] = i.[ParentID]
)

SELECT 
  [ICFilterID], 
  [ParentID],
  [FilterDesc],
  [Active]
FROM cte
ORDER BY [Level];

Best Answer

Related Solutions

Sql-server – Clustered Index vs Primary Key Performance

Sql-server – Parent-Child Tree Hierarchical ORDER

Related Question