I have a logging database with a few lookup tables and one massive event table that has 79 million rows. On busy days my code inserts over 3 million rows, so the event table would be enormous in production if not for the fact that every half hour I copy the latest data to an archive database on a separate server and then every night delete all rows more than one month old from production.
The problem is that these deletes are now taking over 10 hours to run. I don't think they cause any blocking as long as I put 100 ms delay between each delete statement (script below), but I feel like things could be improved. Should the logging db be in snapshot isolation mode? We typically write the row and then update the EndTime column after the function has completed. Any suggestions on how to improve the process and/or speed up production would be appreciated.
DECLARE @CutoffDate datetime = DATEADD(DAY, -30, GETDATE()), @Time datetime, @Stop bit = 0, @RowCount int
WHILE (@Stop = 0)
BEGIN
SET @Time = GETDATE()
DELETE TOP (100) af
FROM ApplicationEvent ae
JOIN ApplicationFault af ON af.ApplicationEventID = ae.ApplicationEventID
WHERE LogTime < @CutoffDate
DELETE TOP (100) ae
FROM ApplicationEvent ae
LEFT JOIN ApplicationFault af ON af.ApplicationEventID = ae.ApplicationEventID
WHERE LogTime < @CutoffDate AND af.ApplicationFaultID IS NULL
SELECT @RowCount = @@ROWCOUNT
SELECT @Stop = CASE @RowCount WHEN 0 THEN 1 ELSE 0 END
--PRINT DATEDIFF(MS, GETDATE(), @Time)
WAITFOR DELAY '00:00:00:100';
END
Here's the schema.
CREATE TABLE [dbo].[ApplicationEvent](
[ApplicationEventID] [int] IDENTITY(1,1) NOT NULL,
[LogTime] [datetime] NOT NULL,
[TransactionID] [int] NULL,
[TransactionType] [int] NULL,
[EventID] [int] NOT NULL,
[EventLevelID] [smallint] NOT NULL,
[StartTime] [datetime] NOT NULL,
[EndTime] [datetime] NULL,
[IPAddress] nvarchar NULL,
[UserName] nvarchar NULL,
[ApplicationID] [smallint] NOT NULL,
[EventInformation] nvarchar NULL,
[HostName] nvarchar NULL,
[ApplicationSourceId] [smallint] NULL,
CONSTRAINT [PK_ApplicationEvent_1] PRIMARY KEY CLUSTERED
(
[ApplicationEventID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Best Answer
Two suggestions come to mind.
One is that the clustered index on both ApplicationEvent and ApplicationFault above be on the LogTime column. Assuming the data is posted to the table in chronological order, you'll have reduced fragementation from page splits and benefit from range scans when purging out older time periods.
The second builds on the first to implement parititioning based on the LogTime column. Instead of the relatively expensive I/O during the delete operations, you can implement a sliding window which would 'slide' older time periods out the main table with ALTER TABLE commands, effectively dropping unneeded dates via essentially meta commands.