Let's say I have a simple table in Postgres (11.3):
create table posts
(
id serial not null,
created_at timestamp(0)
constraint posts_pkey
primary key (id)
);
If a user requests id=5869, I need to be able to return the N rows before and the N rows after that row in a query that is ordered by the created_at
column. If we're able to assume that the larger the id
, the larger the created_at
, we can do something relatively simple like this:
(select * from posts where id < 5869 order by id limit 10)
union all
(select * from posts where id >= 5869 order by id limit 11);
However, I am unable to assume that the higher id is most recently created and I'm wondering what the best way to retrieve that data is in that case. This method works, but is remarkably slow on a 100k row dataset:
WITH
boundaries AS (
SELECT *,
row_number() OVER (ORDER BY created_at DESC) AS rownum
FROM posts
),
target_boundary AS (
SELECT *
FROM boundaries
WHERE boundaries.id = 5869
)
SELECT posts.*, boundaries.rownum
FROM posts
LEFT JOIN boundaries ON posts.id = boundaries.id
JOIN target_boundary ON boundaries.rownum BETWEEN target_boundary.rownum - 10 AND target_boundary.rownum + 10
Running through that was taking upwards of 800 milliseconds which is far too slow on a dataset so small.
I have also tried a variation of the above using lead()
and lag()
, but that was even less efficient.
Is there a better way to do this query? Is there perhaps a window function I am missing in Postgres that would handle it?
Best Answer
Use the power of
UNION ALL
:This query assumes that there are no duplicates in
created_at
.For good performance, you need indexes on
id
(you have that with the primary key) andcreated_at
.If you need the result sorted, use my query as a subselect and add an
ORDER BY
.