SQL Server Query Optimization – RANK on DateTime for Recent Rows

optimizationquery-performancesql servert-sql

I'm having a problem with a view, that is designed to store sync data, for an iOS app, that syncs data back and forth to a web service. The XML fields are used to store both structured data for transposing into PDF documents, and iOS objects. There are frequent writes and reads to this table.

The table has around 600K records, and the data usage is quite high due to two large XML fields. The database table looks like this:

CREATE TABLE [dbo].[SyncData](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [CustomerId] [int] NOT NULL,
    [CategoryId] [int] NOT NULL,
    [SyncXml] [xml] NOT NULL,
    [SyncDump] [xml] NULL,
    [DateCreated] [datetime] NOT NULL CONSTRAINT [DF_SyncData_DateCreated]  DEFAULT (getdate()),
    [DateUpdated] [datetime] NOT NULL CONSTRAINT [DF_SyncData_DateUpdated]  DEFAULT (getdate()),
    [DELETED_FLAG] [bit] NOT NULL CONSTRAINT [DF_SyncData_DELETED_FLAG]  DEFAULT ((0)),
 CONSTRAINT [PK_SyncData] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

The CustomerId and CategoryId columns have foreign keys.

The web service returns the most recent sync data, organized by category, for a specific customer:

CREATE VIEW [dbo].[vw_SyncDataRecent]
AS
    WITH Ranked AS (
        SELECT 
            Id, 
            CustomerId, 
            CategoryId, 
            ROW_NUMBER() OVER (
                PARTITION BY Id, CustomerId 
                ORDER BY DateCreated DESC) AS rn
        FROM SyncData AS cb
    )
    SELECT Id, CustomerId, CategoryId FROM Ranked 
    WHERE rn = 1
GO

The table has a couple of indexes on it already, which are possibly leftovers (this database has been in production since 2004 and has gone through lots of feature additions), so I'm not sure if they are even needed anymore:

CREATE NONCLUSTERED INDEX [IX_SyncData_CustomerId] ON [dbo].[SyncData]
(
    [CustomerId] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_SyncData_CustomerId_CategoryId] ON [dbo].[SyncData]
(
    [CustomerId] ASC,
    [CategoryId] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_SyncData_CustomerId_CategoryId_DELETED_FLAG] ON [dbo].[SyncData]
(
    [CustomerId] ASC,
    [CategoryId] ASC,
    [DELETED_FLAG] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

The view is typically called simply with a WHERE clause on the CustomerId:

SELECT * FROM vw_SyncDataRecent
WHERE CustomerId = 600528

The results look like this:

| Id    | CustomerId    | CategoryId    |
|-----  |------------   |------------   |
| 189   | 600528        | 45            |
| 188   | 600528        | 46            |
| 191   | 600528        | 48            |
| 192   | 600528        | 49            |
| 190   | 600528        | 53            |

The performance on these operations is awful and worst of all it is having a serious impact on the database server, consistently running at 100% (this view query appears to be one of the worst culprits). Is there a better more efficient way to do this. I've also just realized that I don't need any of the rows that have been marked as logically deleted either (DELETED_FLAG), which appears to have been overlooked, but I guess that is a separate matter.

My initial thoughts are that the view itself is the major issue, in that it is ranking over the entire data set. Also that there is no index that includes the DateCreated column, but I'm not sure how useful that is for sorting.

Since the table is quite big, and actively used in production, I don't want to make big changes to the table itself, but I'm happy to change the view or switch it to use a stored procedure instead.

My gut feeling is just to do this, and create a stored procedure:

CREATE PROCEDURE [dbo].[_SyncData_Recent]
(
@CustomerId int = null
)
AS
BEGIN
    WITH Ranked AS (
        SELECT 
            Id, 
            CustomerId, 
            CategoryId, 
            ROW_NUMBER() OVER (PARTITION BY CustomerId, CategoryId ORDER BY DateCreated DESC) AS rn
        FROM SyncData AS cb
        WHERE CustomerId = @CustomerId
        AND DELETED_FLAG = 0
    )
    SELECT Id, CustomerId, CategoryId FROM Ranked WHERE rn = 1 AND CustomerId = @CustomerId
END
GO

Thoughts?

Best Answer

My initial thoughts are that the view itself is the major issue, in that it is ranking over the entire data set.

That is exactly it, the window function is applied to the whole dataset to then be filtered by CustomerId and rn=1. See my previous answer here for more information.

enter image description here

With the CustomerId filter predicate applied only when the data gets to the filter operator:

enter image description here

You could add OPTION(RECOMPILE) to the select query with the filter to see the CustomerId at runtime and filter before passing the data through the window function

SELECT * FROM vw_SyncDataRecent
WHERE CustomerId = 600528
OPTION(RECOMPILE);

enter image description here

Other methods as noted in my other answer are not using the view, using a tvf instead, not using a window function to do the filtering, stored procedures, ....

TVF example

CREATE FUNCTION dbo.[Fnc_SyncDataRecent]
(
    @P1 INT
)  
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
       WITH Ranked AS  (
    SELECT 
            Id, 
            CustomerId, 
            CategoryId, 
            ROW_NUMBER() OVER (
                PARTITION BY Id, CustomerId 
                ORDER BY DateCreated DESC) AS rn
        FROM dbo.SyncData AS cb
    )
    SELECT Id, CustomerId, CategoryId 
    FROM Ranked 
    WHERE rn = 1 and CustomerId = @P1



SELECT * FROM dbo.[Fnc_SyncDataRecent](600528);

I would also add this index:

CREATE INDEX IX_CustomerID3
ON dbo.SyncData([CustomerId])
INCLUDE(CategoryId,ID,DateCreated);

EDIT Your stored procedure

CREATE PROCEDURE [dbo].[_SyncData_Recent]
(
@CustomerId int = null
)
AS
BEGIN
    WITH Ranked AS (
        SELECT 
            Id, 
            CustomerId, 
            CategoryId, 
            ROW_NUMBER() OVER (PARTITION BY CustomerId, CategoryId ORDER BY DateCreated DESC) AS rn
        FROM SyncData AS cb
        WHERE CustomerId = @CustomerId
        AND DELETED_FLAG = 0
    )
    SELECT Id, CustomerId, CategoryId FROM Ranked WHERE rn = 1 AND CustomerId = @CustomerId
END
GO

EXEC [dbo].[_SyncData_Recent] 600528;

Would also work to apply earlier filtering, but I would add this index:

CREATE INDEX IX_CustomerID_CategoryId_DateCreated_
ON dbo.SyncData([CustomerId],CategoryId,DateCreated DESC)
INCLUDE(DELETED_FLAG,ID)
WHERE DELETED_FLAG = 0;

to remove the sort operator.

enter image description here

And the last filter, WHERE rn = 1 AND CustomerId = @CustomerId could just be WHERE rn = 1.