Sql-server – Improve performance of index reorganize on volatile table

index-maintenanceola-hallengrensql serversql-server-2016

This is occurring on SQL Server 2016 (Enterprise).
Example Table structure:

CREATE TABLE [dbo].[Foo](
    [FooId] [bigint] IDENTITY(1,1) NOT NULL,
    [FirstFkId] [bigint] NULL,
    [SecondFkId] [bigint] NULL,
    [ThirdFkId] [bigint] NULL,
    [FourthFkId] [bigint] NOT NULL,
    [ParentId] [bigint] NULL,
    [Flag1] [bit] NOT NULL,
    [Flag2] [bit] NOT NULL,
    [From] [datetime2](7) NOT NULL,
    [Until] [datetime2](7) NOT NULL,
    [CreatedBy] [int] NOT NULL,
    [CreatedOn] [datetime2](7) NOT NULL,
    [ChangedBy] [int] NOT NULL,
    [ChangedOn] [datetime2](7) NOT NULL,
 CONSTRAINT [PK_Foo] PRIMARY KEY CLUSTERED 
(
    [FooId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[FooDetail](
    [FooDetailId] [bigint] IDENTITY(1,1) NOT NULL,
    [FooId] [bigint] NOT NULL,
    [FirstFkId] [bigint] NOT NULL,
    [SecondFkId] [bigint] NULL,
    [Column1] [bigint] NOT NULL,
    [Column2] [nvarchar](250) NULL,
    [Column3] [nvarchar](250) NULL,
    [Column4] [datetime2](7) NULL,
    [Column5] [datetime2](7) NULL,
    [Column6] [datetime2](7) NULL,
    [Column7] [datetime2](7) NULL,
    [Column8] [datetime2](7) NULL,
    [CreatedBy] [int] NOT NULL,
    [CreatedOn] [datetime2](7) NOT NULL,
    [ChangedBy] [int] NOT NULL,
    [ChangedOn] [datetime2](7) NOT NULL,
 CONSTRAINT [PK_FooDetail] PRIMARY KEY CLUSTERED 
(
    [FooDetailId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE [dbo].[FooDetail]  WITH CHECK ADD  CONSTRAINT [FK_FooDetail_Foo] FOREIGN KEY([FooId])
REFERENCES [dbo].[Foo] ([FooId])

This tables have data from a sliding window. There is a process that insert into the tables for a month every day and a different process that deletes older data. Users can insert/update rows.

Both tables have around 125.000 rows.

The data is read like this:

SELECT -- columns
FROM Foo f
INNER JOIN FooDetail fd on fd.FooId = f.FooId
WHERE f. -- Some condition

When investigating we found that the index reorganise for the clustered index of FooDetail takes between 90 minutes and 120 minutes. With the following command:

ALTER INDEX [PK_FooDetail] ON [DbName].[dbo].[FooDetail] REORGANIZE WITH (LOB_COMPACTION = ON)

Extend event info:

<ExtendedInfo>
  <PageCount>3739</PageCount>
  <Fragmentation>5.00134</Fragmentation>
</ExtendedInfo>

We use ola hallengren solution for index optimize (Default settings). With the following command:

sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d master -Q "EXECUTE [dbo].[IndexOptimize] @Databases = 'USER_DATABASES', @LogToTable = 'Y'" -b

Timings

Time What
01:30 – 01:30 Deletes happen on the table
01:35 – 05:00 Process starts with inserts / updates
02:00 – 05:30 Reporting process is started to generate huge number of reports
04:00 – xxxx Index maintenance is started
05:30 – xxxx Integrity check => Full backup
Every hour log backup

The reporting db and reporting tempdb have around 200 user connections open from start to finish. They generate reports that are used through the day and they put a heavy load on the system. Creating the reports on the fly is not an option. They have to be available. Like in many cases 24 hours in a day is to short, to give every process its own time window.

Disk speed (ms) data file:

  • Avg Read Stall: 6
  • Avg Write Stall: 33

Disk speed (ms) log file:

  • Avg Read Stall: 5
  • Avg Write Stall: 6

My question how can we speed up the index rebuild, how can we get the best improvement?

Some ideas:

  • Should we exclude the table from the general index rebuild and have a seperate job to rebuild the index of these tables when we know there is less write usage? To suffer less from locking. (The application is used 24/7 so locking will always be an issue.)
  • Is it a good idea, to remove the clustered index and use a heap instead, since the data is so volatile?
    (We have a different index on FooDetail for FooId)

Best Answer

There is something which is incredibly wrong here. Your tables are tiny!

Just for comparison, I did a reorg on a clustered table with 450,000 rows and 22,000 pages having 42% fragmentation level. It took 1 second. This was on my 3 years old laptop.

You need to do some troubleshooting here.

I would restore the database from a backup taken before such reorg to a new machine and do a reorg on that machine. Depending on that outcome, you will dig deeper.

If it is slow also on that machine, then there is something special about this table. Perhaps you have LOB pages, even though your post suggests you don't? I can't say why a reorg should take that long on a silent machine, but you want to have this as a baseline.

Assming the reorg is quick on the restored copy, you need to determine why it is so ultra-slow on the prod machine. Is it during that time-period? Perhaps you have blocking going on? What else is running that that time? You can for instance poll wait stats using any of the DMVs available (sys.dm_exec_session_wait_stats or sys.dm_os_wait_stats, depending on which is more convenient for you). Or just execute the reorg manually during a period where you know there isn't tons of work going on.

Rebuild is an option, of course. Or not doing defrag in the first place (perhaps you don't really gain anything from a defrag).

But considering that you have determined something way out of the ordinary, you want to get to the root of this before considering skipping defrag or doing rebuild instead.