Assume I have the following DB table with 3 integer fields.
A | B | C
1 | 2 | 3
1 | 2 | 4
1 | 3 | 1
2 | 4 | 2
2 | 4 | 3
When I do:
SELECT * FROM dbTable ORDER BY A,B LIMIT 1;
I get
1 | 2 | 3
which is expected. But the 2nd record:
1 | 2 | 4
also has the same values for dbFields A and B. Is there any efficient way to actually retrieve all records that have the same value as the top-k records? For example, when I search for the first 100 records to get instead 102 records if the last two have the same values as the 100th record? Is there any index to accelerate such queries? I do not mind if it has to be done with pl/pgsql (and not plain SQL) if the implementation is efficient.
Best Answer
You can use a window function for this:
For the "first" rows, it doesn't matter if you use
rank()
ordense_rank()
. When you e.g. want the "second" ones, therank()
anddense_rank()
would return different results in case of ties. Becauserank()
will have "gaps" in the numbers, butdense_rank()
will not.A possible speedup might be achieved by doing this in two steps and of course having an index on
(a,b)
The idea is to give Postgres the chance to do an index-only scan for the ranking part and then join only the result of that to the base table to get the remaining column(s). The filtering on the rank is done inside the CTE as Postgres doesn't push down conditions from the outer query into the CTE itself (that's why I have the derived table inside the CTE)
I'm not sure if this really improves the performance, but I guess it would be worth trying and have a look a the execution plan with the real tables (and data).