I'm having a problem with a view, that is designed to store sync data, for an iOS app, that syncs data back and forth to a web service. The XML fields are used to store both structured data for transposing into PDF documents, and iOS objects. There are frequent writes and reads to this table.
The table has around 600K records, and the data usage is quite high due to two large XML fields. The database table looks like this:
CREATE TABLE [dbo].[SyncData](
[Id] [int] IDENTITY(1,1) NOT NULL,
[CustomerId] [int] NOT NULL,
[CategoryId] [int] NOT NULL,
[SyncXml] [xml] NOT NULL,
[SyncDump] [xml] NULL,
[DateCreated] [datetime] NOT NULL CONSTRAINT [DF_SyncData_DateCreated] DEFAULT (getdate()),
[DateUpdated] [datetime] NOT NULL CONSTRAINT [DF_SyncData_DateUpdated] DEFAULT (getdate()),
[DELETED_FLAG] [bit] NOT NULL CONSTRAINT [DF_SyncData_DELETED_FLAG] DEFAULT ((0)),
CONSTRAINT [PK_SyncData] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
The CustomerId and CategoryId columns have foreign keys.
The web service returns the most recent sync data, organized by category, for a specific customer:
CREATE VIEW [dbo].[vw_SyncDataRecent]
AS
WITH Ranked AS (
SELECT
Id,
CustomerId,
CategoryId,
ROW_NUMBER() OVER (
PARTITION BY Id, CustomerId
ORDER BY DateCreated DESC) AS rn
FROM SyncData AS cb
)
SELECT Id, CustomerId, CategoryId FROM Ranked
WHERE rn = 1
GO
The table has a couple of indexes on it already, which are possibly leftovers (this database has been in production since 2004 and has gone through lots of feature additions), so I'm not sure if they are even needed anymore:
CREATE NONCLUSTERED INDEX [IX_SyncData_CustomerId] ON [dbo].[SyncData]
(
[CustomerId] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_SyncData_CustomerId_CategoryId] ON [dbo].[SyncData]
(
[CustomerId] ASC,
[CategoryId] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_SyncData_CustomerId_CategoryId_DELETED_FLAG] ON [dbo].[SyncData]
(
[CustomerId] ASC,
[CategoryId] ASC,
[DELETED_FLAG] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
The view is typically called simply with a WHERE clause on the CustomerId:
SELECT * FROM vw_SyncDataRecent
WHERE CustomerId = 600528
The results look like this:
| Id | CustomerId | CategoryId |
|----- |------------ |------------ |
| 189 | 600528 | 45 |
| 188 | 600528 | 46 |
| 191 | 600528 | 48 |
| 192 | 600528 | 49 |
| 190 | 600528 | 53 |
The performance on these operations is awful and worst of all it is having a serious impact on the database server, consistently running at 100% (this view query appears to be one of the worst culprits). Is there a better more efficient way to do this. I've also just realized that I don't need any of the rows that have been marked as logically deleted either (DELETED_FLAG), which appears to have been overlooked, but I guess that is a separate matter.
My initial thoughts are that the view itself is the major issue, in that it is ranking over the entire data set. Also that there is no index that includes the DateCreated column, but I'm not sure how useful that is for sorting.
Since the table is quite big, and actively used in production, I don't want to make big changes to the table itself, but I'm happy to change the view or switch it to use a stored procedure instead.
My gut feeling is just to do this, and create a stored procedure:
CREATE PROCEDURE [dbo].[_SyncData_Recent]
(
@CustomerId int = null
)
AS
BEGIN
WITH Ranked AS (
SELECT
Id,
CustomerId,
CategoryId,
ROW_NUMBER() OVER (PARTITION BY CustomerId, CategoryId ORDER BY DateCreated DESC) AS rn
FROM SyncData AS cb
WHERE CustomerId = @CustomerId
AND DELETED_FLAG = 0
)
SELECT Id, CustomerId, CategoryId FROM Ranked WHERE rn = 1 AND CustomerId = @CustomerId
END
GO
Thoughts?
Best Answer
That is exactly it, the window function is applied to the whole dataset to then be filtered by
CustomerId
andrn=1
. See my previous answer here for more information.With the
CustomerId
filter predicate applied only when the data gets to the filter operator:You could add
OPTION(RECOMPILE)
to the select query with the filter to see theCustomerId
at runtime and filter before passing the data through the window functionOther methods as noted in my other answer are not using the view, using a tvf instead, not using a window function to do the filtering, stored procedures, ....
TVF example
I would also add this index:
EDIT Your stored procedure
Would also work to apply earlier filtering, but I would add this index:
to remove the sort operator.
And the last filter,
WHERE rn = 1 AND CustomerId = @CustomerId
could just beWHERE rn = 1
.