Sql-server – Scalability assessment for a simple auditing system based on an ORM

ormperformancequery-performancescalabilitysql server 2014

I was asked to create a simple audit system for a Web application. Due to some constraints I have chosen a "dirt and quick" way. This was possible due to small activity in that particular application (several hundreds of inserts per day).

However, now I have to implement within another Web application that generates more activity, so I consider refactoring it.

Notes:

  • all the queries are generated by the used ORM (entity framework) and look quite dreadful.

  • all tests are performed with a table filled with some 2K dummy records, uniformly distributed among some users, time etc.

Data definition

CREATE TABLE dbo.AppEvent
(
    AppEventId INT NOT NULL IDENTITY(1, 1) CONSTRAINT PK_AppEvent PRIMARY KEY CLUSTERED,
    InsertTimestamp DATETIME2 NOT NULL CONSTRAINT DF_AppEvent DEFAULT(GETDATE()),
    UserId INT NOT NULL CONSTRAINT FK_AppEvent_UserId REFERENCES dbo.AppUser,
    EventTypeId INT NOT NULL CONSTRAINT FK_AppEvent_EventType REFERENCES dbo.EventType,
    RegionId INT NULL CONSTRAINT FK_AppEvent_Region REFERENCES dbo.Region,
    CountryId INT NULL CONSTRAINT FK_AppEvent_Country REFERENCES dbo.Country,
    InsertDay AS (CAST(InsertTimestamp as DATE)),
    InsertMonth AS (CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, InsertTimestamp), 0) AS DATE)),
    InsertYear AS (CAST(DATEADD(YEAR, DATEDIFF(YEAR, 0, InsertTimestamp), 0) AS DATE)),
    Description NVARCHAR(2000) NULL,
    ProjectId INT NULL CONSTRAINT FK_AppEvent_Project REFERENCES dbo.Project,
    ReminderActionId INT NULL CONSTRAINT FK_AppEvent_ReminderAction REFERENCES dbo.ReminderAction
)
GO

This is the only relevant table. All FK references go to clustered primary keys and most of the tables contain less than 100 records.

Actual logging

The application tries to cluster logging info, by avoiding inserting too close in time (same user, same event type).

Thus, it needs to make a SELECT:

exec sp_executesql N'SELECT TOP (1) 
    [Project1].[InsertTimestamp] AS [InsertTimestamp]
    FROM ( SELECT 
        [Extent1].[AppEventId] AS [AppEventId], 
        [Extent1].[InsertTimestamp] AS [InsertTimestamp]
        FROM [dbo].[AppEvent] AS [Extent1]
        WHERE ([Extent1].[UserId] = @p__linq__0) AND ([Extent1].[EventTypeId] = @p__linq__1) AND (([Extent1].[ReminderActionId] = @p__linq__2) OR (([Extent1].[ReminderActionId] IS NULL) AND (@p__linq__2 IS NULL)))
    )  AS [Project1]
    ORDER BY [Project1].[AppEventId] DESC',N'@p__linq__0 int,@p__linq__1 int,@p__linq__2 int',@p__linq__0=1,@p__linq__1=4,@p__linq__2=27

This generates about: CPU = 16, Reads = 34, Writes = 0, Duration = 0

Looking at the execution plan, I thought an index might improve things:

CREATE INDEX IDX_AppEvent_User_EventType ON dbo.AppEvent (UserId, EventTypeId, ReminderActionId) INCLUDE (AppEventId, InsertTimestamp)

This gives CPU = 0, Reads = 20, Writes = 0, Duration = 0

Actual INSERT statements look like this:

exec sp_executesql N'INSERT [dbo].[AppEvent]([InsertTimestamp], [UserId], [EventTypeId], [RegionId], [CountryId], [Description], [ProjectId], [ReminderActionId])
VALUES (@0, @1, @2, @3, @4, @5, @6, NULL)
SELECT [AppEventId], [InsertDay], [InsertMonth], [InsertYear]
FROM [dbo].[AppEvent]
WHERE @@ROWCOUNT > 0 AND [AppEventId] = scope_identity()',N'@0 datetime2(7),@1 int,@2 int,@3 int,@4 int,@5 nvarchar(2000),@6 int',
@0='2017-01-30 14:54:02.6469319',@1=1,@2=7,@3=5,@4=305,@5=N'Custom message',@6=1533

SELECT statement is caused by the ORM which needs to know just created identifier (I will have to look if I can get rid of the extra SELECT which is not needed).

This takes about CPU = 16, Reads = 32, Writes = 0, Duration = 9

Execution plan generates the following:

SELECT

and this

INSERT

Report

Audit reporting is quite simple is rarely run (several times per day maximum). Typical queries look like this:

SELECT 
    1 AS [C1], 
    [GroupBy1].[K2] AS [InsertDay], 
    [GroupBy1].[K1] AS [CountryId], 
    [GroupBy1].[A1] AS [C2]
    FROM ( SELECT 
        [Extent1].[CountryId] AS [K1], 
        [Extent1].[InsertDay] AS [K2], 
        COUNT(1) AS [A1]
        FROM [dbo].[AppEvent] AS [Extent1]
        WHERE ([Extent1].[EventTypeId] IN (1, 6, 7, 9)) AND ([Extent1].[CountryId] IS NOT NULL)
        GROUP BY [Extent1].[CountryId], [Extent1].[InsertDay]
    )  AS [GroupBy1]

`CPU = 16, Reads = 25, Writes = 0, Duration = 11`

SELECT 
    1 AS [C1], 
    [GroupBy1].[K2] AS [InsertMonth], 
    [GroupBy1].[K1] AS [CountryId], 
    [GroupBy1].[A1] AS [C2]
    FROM ( SELECT 
        [Extent1].[CountryId] AS [K1], 
        [Extent1].[InsertMonth] AS [K2], 
        COUNT(1) AS [A1]
        FROM [dbo].[AppEvent] AS [Extent1]
        WHERE ([Extent1].[EventTypeId] IN (1, 6, 7)) AND ([Extent1].[CountryId] IS NOT NULL)
        GROUP BY [Extent1].[CountryId], [Extent1].[InsertMonth]
    )  AS [GroupBy1]

`CPU = 16, Reads = 25, Writes = 0, Duration = 12`

Question: considering a generation of a maximum 100K events per day, should I think about rewriting the whole audit mechanism (from the database perspective)?

At the application layer I can make several improvements like: ensure that audit is not performed in the same transaction as operational changes and use other thread to perform the queries.

Best Answer

You can remove the SELECT from the insert code; change this:

INSERT [dbo].[AppEvent]([InsertTimestamp], [UserId], [EventTypeId]
      , [RegionId], [CountryId], [Description], [ProjectId], [ReminderActionId])
VALUES (@0, @1, @2, @3, @4, @5, @6, NULL)
SELECT [AppEventId], [InsertDay], [InsertMonth], [InsertYear]
FROM [dbo].[AppEvent]
WHERE @@ROWCOUNT > 0 AND [AppEventId] = scope_identity()

to this:

INSERT [dbo].[AppEvent]([InsertTimestamp], [UserId], [EventTypeId]
      , [RegionId], [CountryId], [Description], [ProjectId], [ReminderActionId])
OUTPUT inserted.[AppEventId], inserted.[InsertDay]
      , inserted.[InsertMonth], inserted.[InsertYear]
VALUES (@0, @1, @2, @3, @4, @5, @6, NULL);

I don't see the rate of inserts as a problem at all; you're estimating at most 3 to 4 inserts per second. Ensure the column definitions match the actual requirements; you mentioned in a comment that the Description column can actually be a varchar(255) - reducing this from a varchar(2000) will have a measurable impact on both insert and select statements.