Sql-server – How to take make OFFSET & LIMIT with COUNT (OVER?) when having a subquery

offset-fetchsql serversubquerywindow functions

It appears I have a case I can't quite wrap my brains around. So coming here in hopes to find pointers to a query that maybe could be helpful to someone else too.

In the following, I have a query that functions correctly as far returning results goes but requires a second query that is the same as the one presented here but without OFFSET and the output is just a COUNT(*) of all of the rows.

I have two objectives:

  1. Write the query so that COUNT(*) is returned in the same query. Indeed I have been looking at help pieces such as the excellent SQL SERVER – How to get total row count from OFFSET / FETCH NEXT (Paging) with different ways of solving the problem, but then there's another piece…
  2. Rewrite the join with a window function (e.g. OVER(PARTITION BY) or some more performant way as that query seem to generate an INDEX SCAN and INDEX SEEK on the table. The real query is a bit more complicated in the WHERE part, but it looks to me even one scan could be enough if the query were a bit more straightforward so that the COUNT and MAX could be had simultaneously with the outer query. Even this would be a win, but combined with having the overall COUNT would be even bigger.

Maybe I'm trying to chew a teeny bit more than I can chew currently, but on the other hand, maybe there is now a chance to learn something.

Here are the table and data

CREATE TABLE Temp
(
    Id INT NOT NULL PRIMARY KEY,
    Created INT NOT NULL,
    ParentId INT,
    SomeInfo INT NOT NULL,
    GroupId INT NOT NULL

    CONSTRAINT FK_Temp FOREIGN KEY(ParentId) REFERENCES Temp(Id)
);

-- Some root levels nodes.
INSERT INTO Temp VALUES(1, 1, NULL, 1, 1);
INSERT INTO Temp VALUES(2, 2, NULL, 2, 2);
INSERT INTO Temp VALUES(3, 3, NULL, 1, 3);
INSERT INTO Temp VALUES(13, 13, NULL, 1, 1);

-- First order child nodes.
INSERT INTO Temp VALUES(4, 4, 1, 2, 1);
INSERT INTO Temp VALUES(5, 5, 2, 1, 2);
INSERT INTO Temp VALUES(6, 6, 3, 2, 3);

-- Second order child nodes.
INSERT INTO Temp VALUES(7, 7, 4, 1, 1);
INSERT INTO Temp VALUES(8, 8, 5, 2, 2);
INSERT INTO Temp VALUES(9, 9, 6, 1, 3);

SELECT
    Id,
    newestTable.SomeInfo,
    newestTable.Created,
    CASE WHEN newestTable.RootCount > 1 THEN 1 ELSE 0 END AS IsMulti
FROM
   Temp as originalTable
   INNER JOIN
   (
        SELECT
            SomeInfo,
            Max(Created) AS Created,
            Count(*) AS RootCount
        FROM
            Temp
        WHERE ParentId IS NULL AND GroupId = 1
        GROUP BY SomeInfo
    ) AS newestTable ON originalTable.SomeInfo = newestTable.SomeInfo AND originalTable.Created = newestTable.Created
/*WHERE
(
    originalTable.SomeInfo = 1
)*/
ORDER BY newestTable.Created ASC
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;

P.S. Also How to apply outer limit offset and filters in the subquery to avoid grouping over the complete table used in subquery in Postgresql looks interesting.

<edit:

It looks like

SELECT
    Id,
    SomeInfo,
    GroupId,
    ParentId,
    MAX(Created) OVER(PARTITION BY SomeInfo) AS Created,
    COUNT(Id) OVER(PARTITION BY SomeInfo) AS RootCount,
    CASE WHEN COUNT(Id) OVER(PARTITION BY SomeInfo) > 1 THEN 1 ELSE 0 END AS IsMulti
FROM
    Temp
WHERE
(
    GroupId = 1 AND ParentId IS NULL
)
ORDER BY Created ASC
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;

gets close to there. The problem is, though, there are now two result rows and it appears to me this is due to the original INNER JOIN joining back to Temp that cull it to one row. I wonder if there is a way to apply the conditions somehow either before or after the windowing to match more closely the original query. (And this isn't the same query, to be clear. There's just so little data, hence the queries look like being close to each other.)

Best Answer

So it looks like to me what you are missing is the "Return only the top Created record for each instance". So you are getting all rows, and then watever its top Created value is for the same SomeInfo record. Unfortunately you can't just add the MAX(Created) = Created into the base WHERE clause.

If you just wrap the whole thing in a CTE you can then just add a MAX(Created) = Created into the WHERE and get what you are looking for (not that i think CTE's are the anwer for everything).

WITH CTE (ID, SomeInfo, GroupID, ParentID, Created, MaxCreated, RootCount, IsMulti)
AS
(
    SELECT
        Id,
        SomeInfo,
        GroupId,
        ParentId,
        Created,
        MAX(Created) OVER(PARTITION BY SomeInfo) AS MaxCreated,
        COUNT(Id) OVER(PARTITION BY SomeInfo) AS RootCount,
        CASE WHEN COUNT(Id) OVER(PARTITION BY SomeInfo) > 1 THEN 1 ELSE 0 END AS IsMulti
    FROM
        Temp
)
SELECT ID, SomeInfo, GroupID, ParentID,  MaxCreated AS Created, RootCount, IsMulti
FROM CTE
WHERE
(
    GroupId = 1 
    AND ParentId IS NULL
    AND Created = MaxCreated
)
ORDER BY MaxCreated ASC
OFFSET 0 ROWS FETCH NEXT 5 ROWS ONLY;

In my quick test it has the same execution plan and does not take any additional execution time (See execution plan added below). (now it is a small result set so it is probably something you will still need to test with.)

Execution Plan from Test

Hopefully that is more of what you are looking for.