SQL Server Indexes – How to Fix Indexes to Reduce Fragmentation Percentage

sql server

I need your guidance and help to understand a SQL topic that I've never considered before but now I'm trying to build tables thinking on performance, one of the topics is Fragmentation but after reading some website data I need more knowledge.

The code below will create a new table , that table will have a clustered index as the primary key but also it will have a non clustered index for a column called HosttTimeStamp which is basically a DATETIME that storage the transaction date . The problem is that both indexes are using 98% of fragmentation and I'm looking a way to reduce that when new data gets inserted

USE [DB1]

IF OBJECT_ID('dbo.ProductionTable', 'U') IS NULL BEGIN
CREATE TABLE dbo.ProductionTable
(
    [Tier1CustomerAccountID] INT NOT NULL,
    [Tier1CustomerBusinessName] VARCHAR(255) NULL,
    [Businessname]NVARCHAR(50) NULL,
    [AccountID] INT NOT NULL,
    [RetailerAcc]VARCHAR(10) NULL,
    [Product] VARCHAR(10) NULL,
    [Title] NVARCHAR(50) NOT NULL, 
    [PlanName] VARCHAR(200) NULL,
    [PlanValue] NUMERIC(18,3) NOT NULL,
    [Hosttimestamp] DATETIME NOT NULL,
    [TerminalID] INT NOT NULL,
    [TxnIDType] CHAR(3) NOT NULL,
    [Region] VARCHAR(255) NULL,
    [State] VARCHAR(255) NULL,
    [RetailerAccTxnID] BIGINT NOT NULL,
    [TxnType] CHAR(3),
    [FirstTxnDate] DATETIME NULL,
    [Classification] VARCHAR(50) NULL
PRIMARY KEY CLUSTERED 
(
    [RetailerAccTxnID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
END
GO

DECLARE @tableName varchar(max) = 'dbo.ProductionTable';
DECLARE @indexName varchar(max) = 'IX_ATTActivationsData_Hosttimestamp';
DECLARE @indexDefn varchar(max) = '([Hosttimestamp] ASC)';
IF NOT EXISTS(
    SELECT * FROM sys.indexes 
    WHERE Name=@indexName AND OBJECT_ID = OBJECT_ID(@tableName) ) BEGIN

    EXEC('CREATE NONCLUSTERED INDEX ' + @indexName + ' ON ' + @tableName + @indexDefn)

END;

I'm new on the index topic so I was thinking that maybe my non cluster index is wrong, that instead of being " on dbo.ProductionTable (HostTimeStamp) " must be something like "on dbo.productionTable (RetailerAccTxnID) INCLUDE (HostTimeStamp). The reason is because if I'm not wrong the word "ON" is 100% more for table joins and there will be multiple queries pointing to this table using the HosttimeStamp for the "WHERE" conditions

Best Answer

Fragmentation is only bad if it creates an actual problem for you -- slow queries, high disk usage, high CPU or something else specific. If there is no actual problem then do not stress over fragmentation.

One potential problem is if your queries return lots of rows and those rows are spread all over the disk. Then the disk read/ write heads have to move a lot and response is slower. This is less of a problem with SSD but random reads are still slower than sequential reads.

If, however, your queries only read one (or a very few) rows and they use one of your indexes to get to those rows then fragmentation is not relevant.

The clustered index does not have to be on the primary key. It should be on the column(s) which are often used to read many rows at a time. If you need to return all the rows for a given AccountID then the clustered index should be on that column; if it is important to read all the rows for a given Hosttimestamp then make that column the clustering column. If no one class of query (or INSERT) is more important than any other having the primary key as clustered is a generally good enough choice. In SQL Server there are advantages to having the clustering key as a integer, but it is not required to be so.

When deciding what indexes to create you have to consider all actions that reference the table - INSERT, UPDATE and DELETE as well as SELECT. This is because indexes have to synchronise with the table so statements which change data have to do work to change the indexes, too, slowing overall performance. So identify the most important aspects of the total application workload and index to support them.