Sql-server – MSSQL – Primary Key Clustered specifying multiple columns

sql servert-sql

I've inherited a database (MSSQL 2008R2) with a lot of tables that either have NO primary key or primary key's that look like this:

 ALTER TABLE [dbo].[Distribution_Batch] ADD  CONSTRAINT 
 [PK_Distribution_Batch_1__23] PRIMARY KEY CLUSTERED 
 (
    [BATCH_ID] ASC,
    [DATE_CREATED] ASC,
    [CONTACT_ID] ASC,
    [LAB_CODE] ASC,
    [DISTRIBUTION_TYPE] ASC,
    [CREATOR] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Here is the table itself, these are the columns:

 SELECT TOP (1000) [BATCH_ID]  --- this is a varchar(100) field
       ,[LAB_CODE]
       ,[DISTRIBUTION_TYPE]
       ,[CONTACT_ID]
       ,[CREATOR]
       ,[DESCRIPTION]
       ,[DATE_CREATED]
       ,[DATE_COMPLETED]
       ,[DATE_DEADLINE]
       ,[DATE_CONFIRMED]
       ,[ERROR_CODE]
       ,[CONTENT]
       ,[ADDED_BY]
       ,[SOURCE]
       ,[STATUS]
       ,[DELIVERY_METHOD]
       ,[FILE_SIZE]
       ,[DISTRIBUTION_FORMAT]
       ,[CONTACT_ID_ORIGINAL]
   FROM [dbo].[Distribution_Batch]

All this table does is queue up distribution jobs to be run by our application. Nothing fancy.

They have indexes on numerous columns:

 [DistributionBatch-ContactId-DistributionType-20161108-121538]
 [idx_date_completed]
 [idx_date_created]
 [idx_DistribBatch_BatchId]
 [idx_Distribution_Contacts]
 [IX_Distribution_Batch]
 [IX_Distribution_Batch_LAB_CODE_DISTRIBUTION_TYPE_STATUS_DATE_COMPLETED]
 [IX_Distribution_Batch_LAB_CODE_STATUS]
 [IX_Distribution_Batch_STATUS]
 [IX_Distribution_Batch_STATUS_DELIVERY_METHOD]
 [PK_Distribution_Batch_1__23]

The system overall is slow and the admins response has always been to add more indexes to each table. This table spits out data sequentially based on whether it is completed or not. I can't understand why they would have so many indexes on columns that are never sorted on. Am I missing something?

My question is 2 parts:

Part 1) Does that primary key make any sense? Shouldn't the primary key just be an ID (int), starting at 1 with identity spec and auto incrementing?

Part 2) I need confirmation that none of those indexes make sense and are not necessary. There are a lot more tables that have this same issue.

This database has 190 GB of data and 101 GB of indexes.
All comments and opinions greatly appreciated.

Best Answer

Primay key doesn't take any extra storage, its just the way data is stored in that sort order. Whether Primary key(clustered index) should be id(identity column) or something else depends on the database designer and the requirement of application. I would suggest you to start with non-clustered index and check whether they are utilized or not by your select queries. Indexes are good only for select however bad for all DML operations(mostly) i.e. Insert/Update/Delete.

Non-clustered indexes are the ones which takes separate storage, since more than 50% of your database is stored as index, you may need to check their usage(reads). It is possible that they are simply creating overhead in writing and not very useful for scan/seek operation. Just to add, having 50% or more storage for index is not always bad, it just depends whether you have more read/write operation - Dataware house/reporting/OLTP.

You may download first responder kit provided by Mr. Brent Ozar from here and can run sp_blitzindex for your database and it would check everything regarding your index - whether they are too much, too less or some other issue.

In a normal scenario, there is rule of 5 non-clustered index per table however again depends on so many other factors. Kindly provide more details on the issue you are facing due to these indexes. for your two questions - I have tried to answer them.

Kindly let us know if that helped.