Sql-server – Select parent with a specific child with ordering by a column with no grouping by this column

greatest-n-per-groupsql serversql-server-2012

I have got a parent table, let's call it Post, with two columns – PostId, Title. I have got another one table, let's call it Comment – CommentId, PostId (FK), CreatedDate, Type, Message

I need to get a flat view of parent and one of it's childs which will look like below:

PostId, Title, CommentId, CreatedDate, Type, Message

I need the oldest one comment in this view for each post, but I also need Type and message though I don't want to group by CreatedDate, Type and Message.

Without grouping by them I can't use ORDER BY

I have got two options here – using PARTITION OVER and CROSS/OUTER APPLY. First is not an option cause of performance gain, second one is ok but I can't for example filter by Type (WHERE Type = 3) since it's a view.

What I've tried:

Option one:

SELECT DISTINCT
        P.PostId,
        P.Title
FROM dbo.Post AS P
OUTER APPLY
(
    SELECT TOP 1 
        C.CommentId, 
        C.CreatedDate, 
        C.Type,
        C.Message
    FROM dbo.Comment C
    WHERE 
        C.PostId = P.PostId
    ORDER BY C.CreatedDate ASC
)

Option two:

SELECT 
        P.PostId,
        P.Title,
        C.CommentId, 
        C.CreatedDate, 
        C.Type,
        C.Message
        ROW_NUMBER() OVER (PARTITION BY p.PostId
                           ORDER BY C.CreatedDate ASC) AS Generation
FROM dbo.Post P
LEFT JOIN dbo.Comment C
ON C.PostId = P.PostId

and then use WHERE Generation = 1 from application server

To be clear, the issue is that OUTER APPLY performs great with SQL side paging (skip 100 take 100 for example), but when I need to take a count – it's a bottleneck since of each row processing

Not sure if it helps, but I need something like this:

enter image description here

Best Answer

You concern yourself for nothing. "it's a bottleneck since of each row processing" is an assumption you make, and not necessarily what the optimizer will choose to do, even if it looks to you that you write it that way.

SQL is a declarative language, and as such you tell the engine what you want done, and not how you want it done, like you do in imperative languages.

The query optimizer, especially in SQL Server, is a mind blowing wizard in moving things around, considering alternatives, and understanding what it is that you want, regardless of how you write it. It doesn't get it 100% right, but it is damn good.

Write your query, use several syntax like you did is never a bad idea, and test it against a representative data set. If you see performance challenges, examine the execution plan, or post it here so people can help you.

HTH