Currently, I use a query with DISTINCT ON
to get a list of recent posts for each author.
But how can I get previous recent posts for each author? In other words, how to make DISTINCT ON
return second row of each group, not first.
I need this to compare recent post with previous one.
CREATE TABLE posts (
title varchar(30),
author varchar(30),
created_at date
);
INSERT INTO posts VALUES
('Johns first post', 'John', 'January 1, 2021'),
('Johns second post', 'John', 'January 2, 2021'),
('Johns third post', 'John', 'January 3, 2021'),
('Mikes first post', 'Mike', 'January 1, 2021'),
('Mikes second post', 'Mike', 'January 2, 2021'),
('Mikes third post', 'Mike', 'January 3, 2021');
This query selects the most recent post for each author. (Third post for each in my example.):
SELECT DISTINCT ON (author) * FROM posts ORDER BY author ASC, created_at DESC
db<>fiddle here
I also need the previous post for each author (Second post for each in my example.)
I don't want to use window functions because my table is large enough, and I suppose window functions could be slow.
Best Answer
DISTINCT ON
is only good to get a single (distinct) row per group. And only the one you can sort on top somehow. Even then it's only efficient with few rows per group. See:I am going to assume a big table with many posts per author (the typical case).
Simple and slow
Can be built around
row_number()
in a subquery:DESC NULLS LAST
since all your columns can be NULL.You'd really want all columns to be
NOT NULL
andcreated_at
to betimestamptz
. (See below for more.)The query will use a sequential scan, which is very inefficient for the case. (Just like you supposed.)
Sophisticated and fast
You'd want to use an index efficiently. Assuming a table design as primitive as displayed, one like:
We can put this to work, with some sophistication:
db<>fiddle here
The first step is a classic recursive CTE to get the first post per author. Detailed explanation here:
The second step it to fetch the next post for each author in a
LATERAL
subquery - making use of the index once more.Simple and fast
In a proper relational design, you'd have a separate
author
table like:Then the index can simply be:
And we can have a very simple and very efficient query:
db<>fiddle here