SQL Server – Windowing Offset Based on Timestamp

azure-sql-databasepagingsql serverwindow functions

I'm writing a query that will be used to page results for a social feed. The concept is the mobile app will request N items, and provide a starting datetime which I have called @CutoffTime below. The purpose of the cutoff time is to establish when the paging window should start. The reason why we are using a time stamp instead of a row offset is the time stamp will let us page from a consistent place when getting older posts even if newer social content is added.

Since the social feed items can be from yourself or your friends I'm using a UNION to combine the results from those two groups. Originally I tried the TheQuery_CTElogic without the UNION and it was dog slow.

This is what I have done (including pertinent table schema):

    CREATE TABLE [Content].[Photo]
(
    [PhotoId] INT NOT NULL PRIMARY KEY IDENTITY (1, 1), 
    [Key] UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID(),
    [FullResolutionUrl] NVARCHAR(255) NOT NULL, 
    [Description] NVARCHAR(255) NULL, 
    [Created] DATETIME2(2) NOT NULL DEFAULT SYSUTCDATETIME(),
);

CREATE TABLE [Content].[UserPhotoAssociation]
(
    [PhotoId] INT NOT NULL, 
    [UserId] INT NOT NULL, 
    [ShowInSocialFeed] BIT NOT NULL DEFAULT 0,

    CONSTRAINT [PK_UserPhotos] PRIMARY KEY ([PhotoId], [UserId]), 
    CONSTRAINT [FK_UserPhotos_User] FOREIGN KEY ([UserId]) 
        REFERENCES [User].[User]([UserId]), 
    CONSTRAINT [FK_UserPhotos_Photo] FOREIGN KEY ([PhotoId]) 
        REFERENCES [Content].[Photo]([PhotoId])
);

CREATE TABLE [Content].[FlaggedPhoto]
(
    [FlaggedPhotoId] INT NOT NULL PRIMARY KEY IDENTITY(1,1),
    [PhotoId] INT NOT NULL,
    [FlaggedBy] INT NOT NULL,
    [FlaggedOn] DATETIME2(0) NOT NULL DEFAULT SYSDATETIME(),
    [FlaggedStatus] INT NOT NULL DEFAULT 1,
    [ReviewedBy] INT NULL,
    [ReviewedAt] DATETIME2(0) NULL

    CONSTRAINT [FK_Photos_PhotoId_to_FlaggedPhotos_PhotoId] FOREIGN KEY ([PhotoId]) 
        REFERENCES [Content].[Photo]([PhotoId]),
    CONSTRAINT [FK_FlaggedPhotoStatus_FlaggedPhotoStatusId_to_FlaggedPhotos_FlaggedStatus] FOREIGN KEY ([FlaggedStatus]) 
        REFERENCES [Content].[FlaggedContentStatus]([FlaggedContentStatusId]),
    CONSTRAINT [FK_User_UserId_to_FlaggedPhotos_FlaggedBy] FOREIGN KEY ([FlaggedBy]) 
        REFERENCES [User].[User]([UserId]),
    CONSTRAINT [FK_User_UserId_to_FlaggedPhotos_ReviewedBy] FOREIGN KEY ([ReviewedBy]) 
        REFERENCES [User].[User]([UserId])
);


CREATE TABLE [User].[CurrentConnections]
(
    [MonitoringId] INT NOT NULL PRIMARY KEY IDENTITY,
    [Monitor] INT NOT NULL,
    [Monitored] INT NOT NULL,
    [ShowInSocialFeed] BIT NOT NULL DEFAULT 1,

    CONSTRAINT [FK_Monitoring_Monitor_to_User_UserId] FOREIGN KEY ([Monitor]) 
         REFERENCES [dbo].[User]([UserId]),
    CONSTRAINT [FK_Monitoring_Monitored_to_User_UserId] FOREIGN KEY ([Monitored]) 
         REFERENCES [dbo].[User]([UserId])
);

CREATE TABLE [Content].[PhotoLike]
(
    [PhotoLikeId] INT NOT NULL PRIMARY KEY IDENTITY,
    [PhotoId] INT NOT NULL,
    [UserId] INT NOT NULL,
    [Created] DATETIME2(2) NOT NULL DEFAULT SYSUTCDATETIME(),
    [Archived] DATETIME2(2) NULL,

    CONSTRAINT [FK_PhotoLike_PhotoId_to_Photo_PhotoId] FOREIGN KEY ([PhotoId]) 
         REFERENCES [Content].[Photo]([PhotoId]),
    CONSTRAINT [FK_PhotoLike_UserId_to_User_UserId] FOREIGN KEY ([UserId]) 
         REFERENCES [User].[User]([UserId])
);

CREATE TABLE [Content].[Comment]
(
    [CommentId] INT NOT NULL PRIMARY KEY IDENTITY,
    [PhotoId] INT NOT NULL,
    [UserId] INT NOT NULL,
    [Comment] NVARCHAR(255) NOT NULL,
    [Created] DATETIME2(2) NOT NULL DEFAULT SYSUTCDATETIME(),
    [CommentOrder] DATETIME2(2) NOT NULL DEFAULT SYSUTCDATETIME(),
    [Archived] DATETIME2(2) NULL,

    CONSTRAINT [FK_Comment_PhotoId_to_Photo_PhotoId] FOREIGN KEY ([PhotoId]) 
         REFERENCES [Content].[Photo]([PhotoId]),
    CONSTRAINT [FK_Comment_UserId_to_User_UserId] FOREIGN KEY ([UserId]) 
         REFERENCES [User].[User]([UserId])
);

/*

      End table schema

*/



DECLARE @UserId INT,
    @NumberOfItems INT,
    @CutoffTime DATETIME2(2) = NULL -- Stored Proc input params

-- Make the joins and grab the social data we need once since they are used in subsequent queries that aren't shown
DECLARE @SocialFeed TABLE ([Key] UNIQUEIDENTIFIER, [PhotoId] INT
            , [Description] NVARCHAR(255), [FullResolutionUrl] NVARCHAR(255)
            , [Created] DATETIME2(2), [CreatorId] INT, [LikeCount] INT
            , [CommentCount] INT, [UserLiked] BIT);
-- Offset might be different for each group
DECLARE @OffsetMine INT = 0, @OffsetTheirs INT = 0;

IF @CutoffTime IS NOT NULL
    BEGIN
        -- Get the offsets
        ;WITH [GetCounts_CTE] AS
        (
            SELECT
                [P].[PhotoId] -- INT
                , 1 AS [MyPhotos]
            FROM [Content].[Photo] [P]
                INNER JOIN [Content].[UserPhotoAssociation] [UPA] ON 
                    [UPA].[PhotoId] = [P].[PhotoId] 
                    AND 
                    [UPA].[ShowInSocialFeed] = 1
                LEFT JOIN [Content].[FlaggedPhoto] [FP] ON 
                    [FP].[PhotoId] = [P].[PhotoId] 
                    AND 
                    [FP].[FlaggedStatus] = 3 -- Flagged photos that are confirmed apply to everyone
            WHERE
                [FP].[FlaggedPhotoId] IS NULL -- Filter out flagged photos
                AND
                [UPA].[UserId] = @UserId -- Show the requesting user
                AND
                [P].[Created] >= @CutoffTime -- Get the newer items
            UNION
            SELECT
                [P].[PhotoId] -- INT
                , 0 AS [MyPhotos]
            FROM [Content].[Photo] [P]
                INNER JOIN [Content].[UserPhotoAssociation] [UPA] ON 
                    [UPA].[PhotoId] = [P].[PhotoId] 
                    AND 
                    [UPA].[ShowInSocialFeed] = 1
                INNER JOIN [User].[CurrentConnections] [M] ON 
                    [M].[Monitored] = [UPA].[UserId] 
                    AND 
                    [M].[Monitor] = @UserId AND [M].[ShowInSocialFeed] = 1 -- this join isn't present above  
                LEFT JOIN [Content].[FlaggedPhoto] [FP] ON 
                    [FP].[PhotoId] = [P].[PhotoId] 
                    AND 
                    (
                        [FP].[FlaggedStatus] = 3 
                        OR 
                        ([FP].[FlaggedBy] = @UserId AND [FP].[FlaggedStatus] = 1)
                    ) -- Flagged photos that are confirmed apply to everyone, pending flags apply to the user
            WHERE
                [FP].[FlaggedPhotoId] IS NULL -- Filter out flagged photos
                AND
                [P].[Created] >= @CutoffTime -- Get the newer items
        )
        SELECT
            @OffsetMine = SUM(CASE WHEN [MyPhotos] = 1 THEN 1 ELSE 0 END)
            , @OffsetTheirs = SUM(CASE WHEN [MyPhotos] = 0 THEN 1 ELSE 0 END)
        FROM [GetCounts_CTE]
    END

-- Prevent absence of social data from throwing an error below.
SET @OffsetMine = ISNULL(@OffsetMine, 0); 
SET @OffsetTheirs = ISNULL(@OffsetTheirs, 0);

-- Actually select the data I want
;WITH TheQuery_CTE AS
(
    SELECT
        [P].[Key]
        , [P].[PhotoId]
        , [P].[Description]
        , [P].[FullResolutionUrl]
        , [P].[Created]
        , [UPA].[UserId]
        , COUNT(DISTINCT [PL].[PhotoLikeId]) AS [LikeCount] -- Count distinct used due to common join key
        , COUNT(DISTINCT [C].[CommentId]) AS [CommentCount]
        , CAST(ISNULL(MAX(CASE WHEN [PL].[UserId] = @UserId THEN 1 END), 0) AS BIT) AS [UserLiked]
    FROM [Content].[Photo] [P]
        INNER JOIN [Content].[UserPhotoAssociation] [UPA] ON 
            [UPA].[PhotoId] = [P].[PhotoId] 
            AND 
            [UPA].[ShowInSocialFeed] = 1
        LEFT JOIN [Content].[PhotoLike] [PL] ON 
            [PL].[PhotoId] = [P].[PhotoId] 
            AND 
            [PL].[Archived] IS NULL
        LEFT JOIN [Content].[Comment] [C] ON 
            [C].[PhotoId] = [P].[PhotoId] 
            AND 
            [C].[Archived] IS NULL
        LEFT JOIN [Content].[FlaggedPhoto] [FP] ON 
            [FP].[PhotoId] = [P].[PhotoId] 
            AND 
            [FP].[FlaggedStatus] = 3 -- Flagged photos that are confirmed apply to everyone
    WHERE
        [FP].[FlaggedPhotoId] IS NULL -- Filter out flagged photos
        AND
        [UPA].[UserId] = @UserId -- Show the requesting user
    GROUP BY
        [P].[Key]
        , [P].[PhotoId]
        , [P].[Description]
        , [P].[FullResolutionUrl]
        , [P].[Created]
        , [UPA].[UserId]
    ORDER BY  
        [P].[Created] DESC
        , [P].[Key]  -- Ensure consistent order in case of duplicate timestamps
        OFFSET @OffsetMine ROWS FETCH NEXT @NumberOfItems ROWS ONLY
    UNION
    SELECT
        [P].[Key]
        , [P].[PhotoId]
        , [P].[Description]
        , [P].[FullResolutionUrl]
        , [P].[Created]
        , [UPA].[UserId]
        , COUNT(DISTINCT [PL].[PhotoLikeId]) AS [LikeCount]
        , COUNT(DISTINCT [C].[CommentId]) AS [CommentCount]
        , CAST(ISNULL(MAX(CASE WHEN [PL].[UserId] = @UserId THEN 1 END), 0) AS BIT) AS [UserLiked]
    FROM [Content].[Photo] [P]
        INNER JOIN [Content].[UserPhotoAssociation] [UPA] ON 
            [UPA].[PhotoId] = [P].[PhotoId] 
            AND 
            [UPA].[ShowInSocialFeed] = 1
        INNER JOIN [User].[CurrentConnections] [M] ON 
            [M].[Monitored] = [UPA].[UserId] 
            AND 
            [M].[Monitor] = @UserId AND [M].[ShowInSocialFeed] = 1
        LEFT JOIN [Content].[PhotoLike] [PL] ON 
            [PL].[PhotoId] = [P].[PhotoId] 
            AND 
            [PL].[Archived] IS NULL
        LEFT JOIN [Content].[Comment] [C] ON 
            [C].[PhotoId] = [P].[PhotoId] 
            AND 
            [C].[Archived] IS NULL
        LEFT JOIN [Content].[FlaggedPhoto] [FP] ON 
            [FP].[PhotoId] = [P].[PhotoId] 
            AND 
            (
                [FP].[FlaggedStatus] = 3 
                OR 
                ([FP].[FlaggedBy] = @UserId AND [FP].[FlaggedStatus] = 1)
            ) -- Flagged photos that are confirmed apply to everyone, pending flags apply to the user
    WHERE
        [FP].[FlaggedPhotoId] IS NULL -- Filter out flagged photos
    GROUP BY
        [P].[Key]
        , [P].[PhotoId]
        , [P].[Description]
        , [P].[FullResolutionUrl]
        , [P].[Created]
        , [UPA].[UserId]
    ORDER BY  
        [P].[Created] DESC
        , [P].[Key]  -- Ensure consistant order in case of duplicate timestamps
        OFFSET @OffsetTheirs ROWS FETCH NEXT @NumberOfItems ROWS ONLY
)
INSERT INTO @SocialFeed ([Key], [PhotoId], [Description], [FullResolutionUrl]
            , [Created], [CreatorId], [LikeCount], [CommentCount], [UserLiked])
SELECT TOP (@NumberOfItems)
    [Key]
    , [PhotoId]
    , [Description]
    , [FullResolutionUrl]
    , [Created]
    , [UserId]
    , [LikeCount]
    , [CommentCount]
    , [UserLiked]
FROM [TheQuery_CTE]
ORDER BY  -- Order here so the top works properly
    [Created] DESC
    , [Key]  -- Ensure consistent order in case of duplicate timestamps

-- Output the social feed
SELECT
    [P].[Key]
    , [P].[PhotoId]
    , [P].[Description] AS [PhotoDescription]
    , [P].[FullResolutionUrl]
    , [P].[Created] AS [Posted]
    , [P].[CreatorId]
    , [LikeCount]
    , [CommentCount]
    , [UserLiked]
FROM @Photos [P]

-- Select other data needed to build the object tree in the application layer

I realize I can get rid of the UNION in the GetCounts_CTE but I don't think it will really solve any of the problems I see below.

I see a few potential problems:

  1. That is lots of duplicated logic so I'm probably making life harder for myself.
  2. If an insert happens between calculating the count and selecting the data I'm going to be off. I don't think this would happen frequently but it would lead to weird/hard to debug bugs.
  3. All the problems smarter/more experience people would find with the above setup.

What is the best way to write this query? Bonus points the solution makes my life simpler.

Edit:

I don't want to select all the data and let the client lazily display the items, because I don't want to abuse people's data plans by forcing them to download items they will never see. Admittedly the data probably won't be that big in the grand scheme of things but pennies stackup….

Edit 2:

I strongly suspect this isn't the optimal solution, but it is the best I've come up with so far.

Moving my UNION query to a VIEW like Greg suggested worked well to hide that logic and give a more concise query in my stored procedure. The view also abstracts away the ugliness/complication of the union which is nice because I'm using it twice in my select. Here is the code for the view:

CREATE VIEW [Social].[EverFeed]
    AS 
SELECT
    [P].[Key]
    , [P].[PhotoId]
    , [P].[Description]
    , [P].[FullResolutionUrl]
    , [P].[Created]
    , [UPA].[UserId]
    , COUNT(DISTINCT [PL].[PhotoLikeId]) AS [LikeCount] -- Distinct due to common join key
    , COUNT(DISTINCT [C].[CommentId]) AS [CommentCount]
    , CAST(ISNULL(
        MAX(CASE WHEN [PL].[UserId] = [UPA].[UserId] THEN 1 END), 0) AS BIT) AS [UserLiked]
    , NULL AS [Monitor]
FROM [Content].[Photo] [P]
    INNER JOIN [Content].[UserPhotoAssociation] [UPA] ON 
        [UPA].[PhotoId] = [P].[PhotoId] 
        AND 
        [UPA].[ShowInSocialFeed] = 1
    LEFT JOIN [Content].[PhotoLike] [PL] ON 
        [PL].[PhotoId] = [P].[PhotoId] 
        AND 
        [PL].[Archived] IS NULL
    LEFT JOIN [Content].[Comment] [C] ON 
        [C].[PhotoId] = [P].[PhotoId] 
        AND 
        [C].[Archived] IS NULL
    LEFT JOIN [Content].[FlaggedPhoto] [FP] ON 
        [FP].[PhotoId] = [P].[PhotoId] 
        AND 
        [FP].[FlaggedStatus] = 3 -- Flagged photos that are confirmed apply to everyone
WHERE
    [FP].[FlaggedPhotoId] IS NULL -- Filter out flagged photos
GROUP BY
    [P].[Key]
    , [P].[PhotoId]
    , [P].[Description]
    , [P].[FullResolutionUrl]
    , [P].[Created]
    , [UPA].[UserId]
UNION
SELECT
    [P].[Key]
    , [P].[PhotoId]
    , [P].[Description]
    , [P].[FullResolutionUrl]
    , [P].[Created]
    , [UPA].[UserId]
    , COUNT(DISTINCT [PL].[PhotoLikeId]) AS [LikeCount]
    , COUNT(DISTINCT [C].[CommentId]) AS [CommentCount]
    , CAST(ISNULL(
        MAX(CASE WHEN [PL].[UserId] = [M].[Monitor] THEN 1 END), 0) AS BIT) AS [UserLiked]
    , [M].[Monitor]
FROM [Content].[Photo] [P]
    INNER JOIN [Content].[UserPhotoAssociation] [UPA] ON 
        [UPA].[PhotoId] = [P].[PhotoId] 
        AND 
        [UPA].[ShowInSocialFeed] = 1
    INNER JOIN [User].[CurrentConnections] [M] ON 
        [M].[Monitored] = [UPA].[UserId] 
        AND 
        [M].[ShowInSocialFeed] = 1
    LEFT JOIN [Content].[PhotoLike] [PL] ON 
        [PL].[PhotoId] = [P].[PhotoId] 
        AND 
        [PL].[Archived] IS NULL
    LEFT JOIN [Content].[Comment] [C] ON 
        [C].[PhotoId] = [P].[PhotoId] 
        AND 
        [C].[Archived] IS NULL
    LEFT JOIN [Content].[FlaggedPhoto] [FP] ON 
        [FP].[PhotoId] = [P].[PhotoId] 
        AND 
        (
            [FP].[FlaggedStatus] = 3 
            OR 
            ([FP].[FlaggedBy] = [M].[Monitor] AND [FP].[FlaggedStatus] = 1)
        ) -- Flagged photos that are confirmed (3) apply to everyone
          -- , pending flags (1) apply to the user
WHERE
    [FP].[FlaggedPhotoId] IS NULL -- Filter out flagged photos
GROUP BY
    [P].[Key]
    , [P].[PhotoId]
    , [P].[Description]
    , [P].[FullResolutionUrl]
    , [P].[Created]
    , [UPA].[UserId]
    , [M].[Monitor]

Using that view I shortened my query to the following. Note I'm setting the OFFSET with a subquery.

DECLARE @UserId INT, @NumberOfItems INT, @CutoffTime DATETIME2(2);

SELECT
    [Key]
    , [PhotoId]
    , [Description]
    , [FullResolutionUrl]
    , [Created]
    , [UserId]
    , [LikeCount]
    , [CommentCount]
    , [UserLiked]
FROM  [Social].[EverFeed] [EF]
WHERE
    (
        ([EF].[UserId] = @UserId AND [EF].[Monitor] IS NULL)
        OR 
        [EF].[Monitor] = @UserId
    )
ORDER BY  -- Order here so the top works properly
    [Created] DESC
    , [Key]  -- Ensure consistant order in case of duplicate timestamps
    OFFSET CASE WHEN @CutoffTime IS NULL THEN 0 ELSE        
            (
                SELECT
                    COUNT([PhotoId])
                FROM [Social].[EverFeed] [EF]
                WHERE
                    (
                        ([EF].[UserId] = @UserId AND [EF].[Monitor] IS NULL)
                        OR 
                        [EF].[Monitor] = @UserId
                    )
                    AND
                    [EF].[Created] >= @CutoffTime -- Get the newer items
            ) END 
    ROWS FETCH NEXT @NumberOfItems ROWS ONLY

The view nicely separates the complexity of the UNION from the filtering. I think the subquery in the OFFSET clause will prevent concurrency issues that I was worried about by making the whole query atomic.

One problem I just found while typing this is: in the code above if two photos with the same creation date are on different "pages" then the photos on the subsequent pages will be filtered out. Consider the following data:

PhotoId | Created | ...
------------------------
   1    | 2015-08-26 01:00.00
   2    | 2015-08-26 01:00.00
   3    | 2015-08-26 01:00.00

With a page size of 1 on the initial page, PhotoId 1 will be returned. With the same page size on the second page no results will be returned. I think in order to resolve this I'm going to have to add the Key Guid as a parameter….

Best Answer

I suggest you go down a different path with this. It looks like a pattern of select top() .... order by should be sufficient.

select top(@NumberOfItems)
  P.PhotoId,
  P.[Key],
  P.Created
from Content.Photo as P
  inner join Content.UserPhotoAssociation as UPA
    on P.PhotoId = UPA.PhotoId
where
  -- Older than CutoffTime
  P.Created < @CutoffTime and
  (
  -- My photos
  UPA.UserId = @UserId or
  -- Photos by someone monitored by me
  UPA.UserID in (
                select CC.Monitored
                from [User].CurrentConnections as CC
                where CC.Monitor = @UserId
                )
  )
order by P.Created desc, P.[Key]

To handle cases where Created happens to be equal you need to include the Key value in the cutoff handling. Perhaps a where clause something like this.

P.Created < @CutoffTime or (P.Created = @CutoffTime and P.Key > @CutoffKey)